Lecture Notes

Tuesday, 30 June 2009

New Challenges and Opportunities in Newtwork Biology

Trey Ideker *****

Richard Karp and Lee Hood are his science inspiration.
Wants to reconstruct pathways from multiple biological sources.

If you know something about a pathway you can perturb it systematically in silico and in vivo/vitro. Start was the Galactose metabolism pathway and its transcriptional control. 3 core genes. 1000 other genes were perturbed in the mutants. Transcriptional interactions were available first, now we have protein-protein interactions and we also had KEGG/metabolism.
Coloured networks - states, expression profiles, phosphorylation etc. extract sub-networks from the main network based on colour - pathway extraction.

http://linkinghub.elsevier.com/retrieve/pii/S0092867408009525
Used phenotype for colouring the invastion of HIV.

What is the goal for the next 10 years? (What do networks actually mean?). Cell is a hairball inside in terms of its interactions.

Developed PathBLAST and NetworkBLAST using the analogy of genome assembly for network assembly. Align the orthologues between species to align the network. Works at the protein family level - removes paralogues. They are not the best algorithms now.

Moves to gene relation networks based on synthetic lethality or phenotype changes for double mutants. Gene interaction networks are logical networks (Booleans). Very little overlap between physical and genetic networks. Gene interactions are between subgraphs that are physical interactions.

http://www.cytoscape.org

ChiP-chip looking at DNA damage caused by stress and DNA repair. Finding where a TF binds by immunoprecipitation - identify the fragments that were bound to the immunoglobulin purified TFs. Problems of drift in that binding might not be significant - non-functional binding. Validated byt checking downstream loss in gene deletions but only 10% of those sites identified from ChiP-chip are verified. Spurious interactions take place close to telomeres closer than 25 kilo-base-pairs. This is condition specific so adding rapamycon can perturb it. Possibly indicates sequestration effects that make the TFs non-functional.

Using it for disease classification. Diagnose breast cancer metastasis using expression profiles. 300 patients 1/3 of whom became metastatic. Area under ROC is 65%. Heterogeneity is a problem. Little overlap between different gene sets from different studies.
Look at sequentiality or connections in protein-protein interactions.

http://www.nature.com/msb/journal/v3/n1/full/msb4100180.html

Cancer might be the perturbation out of homeostasis.

http://www.nature.com/nbt/journal/v27/n2/full/nbt.1522.html

Web 09

Bioinformatics in an Undergraduate Programme

Kam Dahlquist, Murli Nair *****

Part of curriculum development movement - the need to use problem based learning and the need to have a progressive curriculum that follows what will happen in real life. How to build problems and hands on learning into biology curriculum including statistics. Based around the ideas from the http://bioquest.org frameworks.

Online reports about the problems of including quantitative and computational skills into the biology curriculum.

"Bioinformatics is Biology and we cannot just have 2 pages in the textbook" Murli Nair, IUSB.

http://genomebiology.com/2008/9/12/114 All Biologists are Bioinformaticians

Monday, 29 June 2009

Network Based Prediction of Metabolic Enzyme Sub-cellular Localisation

Shira Mintz-Oron***
Clear and easy to understand talk but the method seems very complex, it is necessary to go to the flux level?

GFP and microscopy are the experimental techniques but they are costly and difficult in higher eukaryotes.

Localisation is by motifs, composition, homology etc.
Here predict localisation based on metabolic networks.
Prior knowledge of the localisation of a subset of the network is needed, want to minimise the number of membrane transport reactions (thermodyamic cost).
Use constraint based modelling to predict the flux rate at a steady state, predict the flux rates under constraints of mass balance, thermodynamic constraints and enzymatic capacity (rate bounds). Maximise biomass production.

Divide the data into the localised and the unlocalised (unknowns) enzymes.
Put all the enzymes into all compartments - build all the transport reactions limit enzyme activity for known localised reactions to the compartments in which they are known to be active.
Give a unit penalty for transport reactions - want a low penalty so that some transport is allowed.

Fuzzy classification so gets scores for more than one compartment and so this means that a distribution across multiple compartments is possible.

Using Side Effects of Medicines to Identify Drug Targets

Michael Kuhn **
Phenotype data - what side effects are caused by a drug - from clinical trials - text mined (1000)
Targets of drugs are also available as a dataset - 500
750 drugs to characterise drugs with similar side effects have similar targets.

Problem to deal with synonyms in phenotype data - use ontologies (Costart) to cluster the same concepts in the side effects.
Some side effects are very common - parent terms of more specific side effects in the ontology - these become non-predictive. use a log frequency weight.
Some side effects are correlated use Gerstein-Sonnhammer-Chothia weights (from HMM).

Use shuffling to normalise the score and get the side effect similarity. Also measure chemical similarity. Low chemical similarity low chance of sharing targets and similar for side effect similarity - need both chemical and side effect similarity for effective description. Side effect much more effective at predicting the same targets than chemical similarity.

http://sideeffects.embl.de

Can create a drug-drug network connected when they share a target which will have the same phenotype. Rabeprazole is an exception as it is not a nervous system drug but a stomach drug.

Side effects also occur with the placebo as well as with the drug.

On targets and off targets are treated equally in the model - so they assume the same off target is affected in the case of having the same side effects.

http://stitch.embl.de

Clustered Alignments of Gene-Expression Time Series Data

Adam Smith ***
Want to align time series to compare treatments so you can find causative effects. Ultimately would be good to search a database for similar effects to find genes that are operating together.

You get warping so you need to align equivalent points. Use splines to create continuous series from discrete data.
Shorting alignments trim series to the same features of maxima and minima.
SCOW - efficient method for aligning time series.

Most extensive fiting is dynamic programming to minimse Euclidean distances between two time series being aligned (Sakoe and Chiba 1978)
Parametric time warping (Eilers 2005) approximate warping to a parabolic or linear warp.
Segment based warping (Smith et al 2008) - alignment score for different segments
COW correlation optimised warping - points of discontinuity are called knots - where there is a break and a warp.
SCOW - shorting correlation optimised warping.

Evaluated EDGE toxicology database 216 observations 1600 genes times 6 to 96 hours.
Trying to match query to find the most similar treatment profiles in the database.

For clustering pick an average time series alignment and then the extremes above and below before continuing to add the other time series distances to each of the clusters. By using clusters the alignments are improved.

Modelling Stochasticity and Robustness in Gene Regulatory Networks

Abhishek Garg ****

Takes a Boolean modelling approach. Stochastic Boolean Modelling

T-helper cell differentiation gene regulatory network Mendoza and Xenarios 2006
T-cell activation network Stephan Klamt

Enviromental input changes the nature of the differentiated cell. Stochastic behaviour is common in biological models.

Robustness maintains functionality over perturbations. In gene networks do they move between steady states (can they change attractor), and give different cell type.

Stochasticity can be applied to the nodes or the functions.
Not Probabalistic Boolean Networks - Datta group.
Use Boolean functions AND, OR etc.

Stochasticity in nodes flip the output using a probability distribution.
Kauffman, Willanda etc lots of literature.
Over-represents noise by placing it at the end and not at the inputs/intermediates.

More stochastic the more interaction are involved so allosteric and protein localisation has high noise.
http://si2.epfl.ch/~garg/

Modelling Ecological and Genetic Diversity in Bacteria

Eric Alm ***
Used adaptML to look at the partitioning of bacteria by habitat and seasonal variation. This is an MC based method.

Ecological preferences are sufficient to produce speciation - what was defined as one species was found to have 17 different habitat locations and seasonal variations. Habitats are defined by net sizes or species in which they are present. Move from fish to squid means adaptation to the mucus membranes and so aquired new features.

Go from zooplankton to small particle associated. Looking for patterns of gene expression, between different lifestyles. Did 20 genomes how do you analyse a population of genomes?

Where are the recombination break-points?

Inconsistency of a phylogentic tree at a point with phylogentic trees from preceding bases.

McDonald-Kreitman Test