Course in theory |
Laboratory
practice |
1, What is covered in this course, and why? Brief introduction. Genome sequencing
techniques: Sanger, 454/Roche, Illumina/Solexa The FASTA and FASTQ formats.
Introduction to metagenomics. Microbial diversity. |
1, Linux basics, commands, piping, archiving,
updating. (RT) |
Part 1: Genome sequencing 2, Levels
of sequence assembly: chromosomes, Scaffolds & contigs, SRA or trace. Re-sequencing vs. de novo sequencing.
Assembly of short reads. Strategies: Hashing. Hashing in parallel
(example with the element distinctness problem). |
2, Shell scripts, compilation, ssh tunneling,
(RT) |
3, The Burrows-Wheeler transform. Easy
substring-search, computing its inverse. Sequence assembly with graphs. The
Hamiltonian cycle reduction. The Eulerian path reduction. De Bruijn graphs |
3, Python script language |
2.,
Genome sequencing: Related problems in bioinformatics: Physical mapping.
Physical mapping with backtrack algorithm. |
4, Python continued (BD) |
3., Shotgun
sequencing. Motivation, calculation of sequence overlaps (later: with suffix
trees). Greedy algorithm. (IG) |
5, Sequencing, sequence alignments (IG) |
Part 2: Sequence analysis 4., Strings,
basic problems related to string search: pattern matching, alignments.Brief
introduction on data structures. Two approaches
for pattern matching: preprocessing the text or the searched pattern.Distance
functions on strings: Hamming-distance, Levenshtein-distance,
Levenshtein-distance with different costs. Dynamic programming: main ideas and applicability to sequence
comparisons. "Scoring functions" and their motivations. Brief
introduction on scoring matrices (PAM, BLOSUM). Databases of amino acid
sequences: UniProt = (SwissProt U TrEMBL); SwissProt and TrEMBL difference;
RefSeq (also nucleotide sequences); corresponding websites; download hints (IG) |
6, Sequencing, sequence
alignment (IG) |
5., Pattern matching using pattern preprocessing: the Boyer-Moore
algorithm. Pattern matching using text preprocessing: Suffix trees and suffix
arrays. Quickly solvable tasks using Suffix trees. |
7, Sequencing, sequence
alignment (IG) |
6., Sequence
alignment algorithms The concept of local, global, "glocal" alignments. Gap penalty
types. Alignment with different overlap requirements / different gap penalty
types. Motivation behind scoring functions: statistical considerations.
Sequence alignment: statistical parameters and their interpretations. Live
demonstration of the different alignment algorithms. (IG) |
TBA |
7., Multiple
alignments, heuristic sequence alignment algorithms. Basic idea: BLAST (in detail), improvement possibilities: PSI-BLAST
(sketch). Phylogeny, evolution trees. The NCBI taxonomy tree. Different
methods for constructing phylogenetic trees. (IG) |
TBA
|
8,g From genes
to proteins: finding protein coding genes (GV) Transcription and translation.
CDS, ORF, gene finding. |
8, System
administration for beginners (R |
9, AI and Bioinformatics. Markov models. Hidden Markov
models. The forward algorithm. Viterbi algorithm. The Viterbi learning
algorithm. (GV) |
|
Part 3: Structure prediction
10, Molecular
structure primer; Molecular structure prediction (GV) |
|
11,
Drug-protein and protein-protein docking (GV) |
|
Part 4: Interaction networks 12, Molecular
networks: metabolic and physical interaction networks Source of information: online databases (DIP, MINT, Intact, HPRD) (GV) |
12. PPI download and
update (BD) |
13, Molecular
networks: gene regulatory and cell signaling networks (GV) |
13, PPI generation (BD) |
14, Protein
function prediction and similarity (GV) |
14, PPI analysis (BD) |