Előadás: Péntek 10:15-11:45, Déli épület 3-114
Első óra: 2025. február 14.
……………..
Gyakorlat: http://pitgroup.org/bioinfogyak/
- Előadók: Grolmusz Vince, (grolmusz@pitgroup.org), Varga Bálint
Ajánlott könyvek (de ezek nem kellenek a sikeres vizsgához):
Metagenomika: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet (The National Academies Press, 2007, FREE)
Bioinformatika: Lengauer: Bioinformatics – from genomes to therapies.. Vol.1-3 (Wiley, 2007)
Isaev A. Introduction to mathematical methods in bioinformatics (Springer, 2006)
Az előadások anyaga (felhasználónév és jelszó kell hozzá)
Tematika:
What is covered in this course, and why? Brief introduction. Genome sequencing techniques: Sanger, 454/Roche, Illumina/Solexa The FASTA and FASTQ formats. Introduction to metagenomics. Microbial diversity.
Genome sequencing
Levels of sequence assembly: chromosomes, Scaffolds & contigs, SRA or trace. Re-sequencing vs. de novo sequencing. Assembly of short reads.
Strategies: Hashing. Hashing in parallel (example with the element distinctness problem).
The Burrows-Wheeler transform. Easy substring-search, computing its inverse. Sequence assembly with graphs. The Hamiltonian cycle reduction. The Eulerian path reduction. De Bruijn graphs
Sequence analysis
Strings, basic problems related to string search: pattern matching, alignments.
Two approaches for pattern matching: preprocessing the text or the searched pattern. Distance functions on strings: Hamming-distance, Levenshtein-distance, Levenshtein-distance with different costs.
Dynamic programming: main ideas and applicability to sequence comparisons. “Scoring functions” and their motivations. Brief introduction on scoring matrices (PAM, BLOSUM). Databases of amino acid sequences: UniProt = (SwissProt U TrEMBL); SwissProt and TrEMBL difference; RefSeq (also nucleotide sequences); corresponding websites; download hints
Pattern matching using pattern preprocessing: the Boyer-Moore algorithm. Pattern matching using text preprocessing: Suffix trees . Quickly solvable tasks using Suffix trees.
Sequence alignment algorithms
The concept of local, global alignments. Gap penalty types. Alignment with different overlap requirements / different gap penalty types.
Basic idea: BLAST (in detail)
Phylogeny, evolution trees. The NCBI taxonomy tree. Different methods for constructing phylogenetic trees.
From genes to proteins: finding protein coding genes (GV) Transcription and translation. CDS, ORF, gene finding.
Structure prediction
Molecular structure primer; Molecular structure prediction
Drug-protein docking
Interaction networks
Molecular networks: metabolic and physical interaction networks
Source of information: online databases
Molecular networks
Protein function prediction and similarity
Brain informatics
MRI data processing, brain graphs, brain graph analysis