Masterprüfung mit Defensio, Wielach Julia

01.03.2021 11:00 - 12:30

Machine learning for RNA structure prediction

Once only thought of as the intermediate step between DNA and proteins, RNA is nowadays known to be involved in various biological processes in a regulating or catalyzing function. RNA function is usually strongly dependent on its structure, with the structure also being more conserved than the sequence, making RNA structure prediction the subject of research since decades. RNA folds in a hierarchical way from primary, to secondary, to tertiary and to quaternary structure, forming the secondary structure quite rapidly, while the formation of the tertiary structure is usually a slow process. RNA secondary structure consists of all canonical base pairs, including wobble base pairs. This includes base pairs of Adenine with Uracil, Cytosine with Guanine and Guanine with Uracil. Secondary structure is of interest for research, as it is a suitable intermediate step in the prediction of the full tertiary structure and efficient computational methods for its prediction are available. Traditional methods have been relying on dynamic programming algorithms to find the lowest free energy structure, using nearest neighbor parameters to estimate folding stability. While these algorithms are efficient and widely used, they have some shortcomings. For example the restriction on all nested structures, excluding pseudoknots, or the possible neglect of long distance effects. As Artificial Intelligence methods are gaining popularity in many fields, ranging from image classification to speech recognition and protein structure prediction, they have found their way into RNA secondary structure prediction. The aim of this thesis is to explore deep learning techniques for their use in RNA secondary structure prediction, to see if they could reach the performance of state-of-the-art dynamic programming approaches or even offer enhancements. Different network types such as Long-Short Term Memory Networks (LSTMs) or Convolutional Neural Networks (CNNs) are tested as well as different ways to represent input and output. While several already published machine learning models report good performances, this thesis tries to discuss aspects such as comparability between methods, generalization ability and dataset dependence to offer a broad view on the topic. Additionally, other measures of performance, such as the representation of global and local secondary structure constraints and the overall topology of structures, are presented.

Location:
online Videokonferenz