Statistics
410-01 Bioinformatics Spring 2003
Time
and Place: Tu Th 12:30-1:45, CLAS 344
Instructor:
Professor Lynn Kuo
Email:
lynn@stat.uconn.edu
Office: CLAS 330
Phone:
486-2951
Office hours: Tu Th 2-3:30
Guest
Lecturers:
Dr. Winfried
Krueger, supervisor of the Genomics Core facility , UCONN Health Center,
email:
WKRUEGER@PANDA.UCHC.EDU
Mr. Pascal
Lapierre, researcher at the lab of Professor Peter Gogarten (Molecular and Cell Biology), email:
Pascal.Lapierre@Huskymail.uconn.edu
Professor
Dong-Guk Shin of the Computer Science Department, email: shin@engr.uconn.edu
Required
Textbooks:
(1) Durbin, R. Eddy, S., Krogh, A. and Mitchison, G.
(1998). Biological Sequence Analysis
Probabilities models of proteins and nucleic acids. Cambridge University Press.
(2) Greg Gibson and Spencer V. Muse (2002). A Primer of Genome
Science, Sinauer Associates, Inc.
(3) M. Kanehisa (2000). Post-genome informatics, Oxford
University Press.
Recommended
Texts:
(1) Michael S. Waterman (1995). Introduction to Computational
Biology, Maps, sequences and genomes. Chapman & Hall.
(2) Pierre Baldi and Wesley Hatfield (2002). DNA Microarrays and
Gene Expression from Experiments to Data Analysis and Modeling. Cambridge
University Press.
(3) Rex A. Dwyer (2002). Genomic Perl From Concepts to Working
Code. Cambridge University Press.
(4) Editor: M. Schena (1999). DNA Microarrays, Oxford University
Press.
(5) Editor: Terry Speed (2002). Statistical Analysis of Gene
Expression Microarray Data. Chapman & Hall/CRC.
(6) Editors: Andreas D. Baxevanis and B. F. Francis Ouellette
(2001). Bioinformatics: A Practical Guide to the Analysis of Genes and
Proteins.2nd Ed. Wiley.
(7) Hastie T, Tibshirani, R. and Friedman, J. (2001). The
Elements of Statistical Learning: Data Mining, Inference, and Prediction,
Springer-Verlag.
(8) Han, J. and Kamber, M. (2001). Data Mining: Concepts and
Techniques, Academic Press.
(9) Lodish et al. (2000). Molecular Cell Biology, 4th edition,
Freeman.
(10) David P. Clark and Lonnie D. Russell (2000). Molecular
Biology made simple and fun, Cache River Press.
This is a
research-oriented interdisciplinary course. Preference will be given to Ph.D.
students in Molecular and Cell Biology or its related field, Computer Science
and Statistics. M.S. students in these fields will be accepted provided space
is available.
The availability of
massive amount of DNA sequence data and protein structure data has spurred the
need to extract the embedded information by computational and analytical means.
The need is the major impetus for developing bioinformatics and computational
biology. In this course, we will explore topics in gene expression studies,
sequence alignment and protein structure prediction.
The philosophy of this
class is that we would like to train collaborators, not hybrids in the
bioinformatics area. Making hybrids, teaching biologist enough math, or
teaching statisticians (or computer scientists) enough biology so each group
can be useful on their own, is not practical. The winning strategy is to teach
collaborators. There are mathematicians who want to solve problems in biology.
However, they solve problems that don't have impact on the real world.
Statisticians need to ask how do we use what we know to solve the problems that
biologists have. Having solved the problem how do we explain to biologists so
they will understand the solutions. We have plenty of people who know math and plenty who know
biology. However, we have no communication. So the emphasis of this course is
on communication among biologists, computer scientists, and statisticians.
Several group projects
will be developed in the course. It is planned that each group has at least one
biologist, one statistician, and one computer scientist. A term paper and an
oral presentation are required at the end of the semester. You will be graded based on this term paper
and the oral presentation.
Week 1 (Jan. 23 and 28): Organization of the course (Kuo)
Biology of Gene and Protein (Lapierre)
Week 2 (Jan. 30 and Feb. 4): Genome Sequencing and Annotation
(Kuo)
Week 3 (Feb. 6): Mining the Genome (Shin) (Shin_1, Shin_2,Shin_3)
Week 3 (Feb. 11): Microarray Structure (Krueger) (Krueger_1)
Week 4 (Feb. 13): Mining the Genome
Week 4 (Feb. 18): Data Interpretation for Microarray (Krueger) (Krueger_2)
Week 5 (Feb. 24, 6-9pm): Hierarchical modeling and variance
components, (MCMC),
normalization methods
Week 6 (Feb. 27 and March 4) Comparative analysis, false
discovery rates, permutation analysis (Wong&Tseng_1,Wong&Tseng_2)
Week 7 (March 11 and 13) Clustering, self-organizing map and
dimension reduction(Han&Kamber_8)
Week 8 (March 25 and 27) supervised learning, neural network,
classification, prediction (Han&Kamber_7)
Week 9 (April 1) support vector machines (svm)
Week 10 (April 3 and 8) Proteomics and Functional Genomics
Week 11 (April 10 and 15) Building phylogenetic trees
Week 12 (April 17 and 22) Integrative Genomics
Weeks 13, 14 and 15: Research Presentations