Course in theory |
Laboratory practice |
1, What is covered in
this course, and why? Brief introduction.
Genome sequencing techniques: Sanger, 454/Roche,
Illumina/Solexa The
FASTA and FASTQ formats. Introduction to
metagenomics. Microbial diversity. |
1, Linux basics,
commands, piping, archiving, updating. |
Part
1: Genome sequencing 2, Levels of
sequence assembly: chromosomes, Scaffolds &
contigs, SRA or trace. Re-sequencing
vs. de novo sequencing. Assembly of short reads. Strategies: Hashing.
Hashing in parallel (example with the element
distinctness problem). |
2, Shell scripts,
compilation, ssh tunneling, |
3, The
Burrows-Wheeler transform. Easy substring-search,
computing its inverse. Sequence assembly with graphs.
The Hamiltonian cycle reduction. The Eulerian path
reduction. De Bruijn graphs |
3, Python script language
|
2.,
Genome sequencing: Related problems in
bioinformatics: Physical mapping. Physical mapping
with backtrack algorithm. |
4, Python continued |
3., Shotgun
sequencing. Motivation, calculation of sequence
overlaps (later: with suffix trees). Greedy
algorithm. |
5, Sequencing, sequence
alignments |
Part
2: Sequence analysis
4., Strings,
basic problems related to string search: pattern
matching, alignments.Brief introduction on data
structures. Two
approaches for pattern matching: preprocessing the
text or the searched pattern.Distance functions on
strings: Hamming-distance, Levenshtein-distance,
Levenshtein-distance with different costs. Dynamic programming: main ideas and
applicability to sequence comparisons.
"Scoring functions" and their
motivations. Brief introduction on
scoring matrices (PAM, BLOSUM).
Databases of amino acid sequences:
UniProt = (SwissProt U TrEMBL);
SwissProt and TrEMBL difference; RefSeq
(also nucleotide sequences);
corresponding websites; download
hints |
6,
Sequencing, sequence alignment |
5., Pattern matching using pattern
preprocessing: the Boyer-Moore algorithm. Pattern
matching using text preprocessing: Suffix trees and
suffix arrays. Quickly solvable tasks using Suffix
trees. |
7,
Sequencing, sequence alignment |
6., Sequence
alignment algorithms The concept of local, global, "glocal"
alignments. Gap penalty types. Alignment with
different overlap requirements / different gap
penalty types. Motivation behind scoring
functions: statistical considerations. Sequence
alignment: statistical parameters and their
interpretations. Live demonstration of the
different alignment algorithms. |
TBA |
7., Multiple
alignments, heuristic sequence alignment algorithms. Basic idea: BLAST (in detail), improvement
possibilities: PSI-BLAST (sketch). Phylogeny,
evolution trees. The NCBI taxonomy tree. Different
methods for constructing phylogenetic trees. |
TBA
|
8,g
From genes to proteins: finding protein coding
genes Transcription and translation. CDS, ORF,
gene finding.
|
8,
System administration for beginners |
9, AI and Bioinformatics. Markov
models. Hidden Markov models. The forward algorithm.
Viterbi algorithm. The Viterbi learning
algorithm. |
|
Part
3: Structure prediction
10,
Molecular structure primer; Molecular structure
prediction |
|
11,
Drug-protein and protein-protein docking |
|
Part
4: Interaction networks
12,
Molecular networks: metabolic and physical interaction
networks Source of information: online databases (DIP,
MINT, Intact, HPRD) |
12.
PPI download and update |
13,
Molecular networks: gene regulatory and cell signaling
networks |
13,
PPI generation |
14,
Protein function prediction and similarity |
14,
PPI analysis |