Large scale modeling of gene regulatory networks

Dr. Martin Stetter
Siemens AG, Corporate Research
Munich, Germany

The perhaps most important signaling network in living cells is constituted by the interactions of proteins with the genome -- the gene regulatory network of the cell. From a system-level point of view, the various interactions and control loops, which form a genetic network, represent the basis upon which the vast complexity and flexibility of life processes emerges. Sometimes, local interventions on one or a few genes -such as mutations or drug effects- can evoke a dramatic change of the global network operation often resulting in systemic diseases such as cancer.

Here we provide a review over some efforts towards gaining a quantitative understanding of regulatory genetic networks by means of large scale computational models. After a brief description of the biological principles of gene regulation and the novel data basis provided by genome wide expression measurements by DNA microarrays, I will summarize a set of recent data driven approaches for modeling gene regulatory networks based on gene expression data. Examples based on network models will include clustering, Bayesian Networks, decomposable graphical models and generative inverse modeling. They will help identifying both structural (scale-free property) and functional (local-global relationships) features of genetic networks, which in turn can help to disentangle gene functions, disease mechanisms and drug effects.

Seminar outline compiled by Manel Guerrero:

Author's short bio

Martin Stetter works as principal research scientist at the Neural Computation Department of Siemens Corporate Technology, Munich, Germany, where he heads the "Bio-analogue Technologies and Solutions" research team. He received a Ph.D. in Biophysics from the University of Regensburg (Germany) in 1994 based on neuronal modeling of developmental processes in early mammalian vision. From 1992 to 1996 he headed the computing group at the University Eye Hospital of Regensburg. A post-doctoral fellowship at the Dept. of Biophysics devoted to modeling early mammalian vision accompanied this position from 1994-1996. Between 1996 and 2000 he worked as an assistant professor in the field of computational neuroscience at the Dept. of Computer Science of the Technical University of Berlin (Germany), where he received a habilitation (german post-doctoral academic degree) in 2001. Since 2000 he is with the Neural Computation Department, where he works on Computational Neuroscience of vision, and started to establish bioinformatics and systems biology research. Since 2002, he is also faculty member of the Technical University of Munich as an associate professor for computer science. He has published about 40 journal and conference papers on computational neuroscience, statistical modeling, functional image analysis and bioinformatics, as well as a book and several review articles.

Introduction

A cell has to be able to do a lot of different functions during its lifetime. This things include: trans-membrane transport (eating), metabolism (digesting), proliferation (growing), division (reproduction), specific functions specific to its kind (work) like secretion, cellular signaling (to communicate with the others), migration (to move), apoptosis (to die), etc.

All the information about how to do it is contained in the DNA (which is has almost the same information in all the cells of a body). The question is, how is this done? A first hypothesis could be the so called "One-gene-one-protein". That means, that each gene is used to generate a protein, and then that protein performs (or it's involved in) certain action. That hypothesis cannot be valid because the human genome (the set of all the genes in a human cell) is composed by about 30.000 genes and it's almost the same for most cells. But the proteome (the set of proteins that are in a cell) is a subset of about 1.000.000 proteins which differs between cells that perform different functions.



The cell produces proteins following the 'instructions' that are in its genes. This process is called expression. It starts by obtaining the mRNA from the DNA (which is called transcription). Then, it continues by removing the introns and leaving only the exons (that's called splicing). And finally, by a process called translation the protein is obtained. This protein should go through a process called PTM (folding) before becoming an active protein.

Actually, some of those active proteins are the ones who actually do the splicing and the folding. And, some others affect back to the genes (in a process called regulation). They don't necessary have to affect to the genes from which they were expressed. Therefore, the influences from expression and regulation conform a complex recurrent non-linear system. It is also needed to say that genes and proteins are also affected by external signals.



Levels of Cellular Modeling

When trying to model a cell (with it expression and regulation mechanisms) there are different approaches. See the table below.

There are some projects that are building biochemical models (like Virtual Cell and E-CELL) and some others that are building dynamical models (like GNA and Gen-O-Matic). But there was not much work in trying to build a Statistical model. That is where GeneSim (TM) focuses. Genesim is being developed by Stetter's team at Siemens.

Levels of Cellular Modeling
StatisticalData Mining Approach
Data
Driven

|
|
|
|
|

Hypothesis
Driven
High
Abstraction

|
|
|
|
|

High
Detail
Graphical Models
Bayesian Networks
DynamicalBoolean Networks
Neuronal Networks
Differential Equations
BiochemicalContinuous Reaction Kinetics
Reaction-Diffusion Models
Stochastic Reaction Networks


Obtaining the Gene Expression Profiles

To use GeneSim, first we need to obtain the differential gene expression profiles from different patients' DNAs with DNA microarrays. Each spot of a microarray contains single-stranded nucleotide sequences as probes, which are complementary to a sequence of one gene. Differently labeled RNA molecules (light and dark gray) from two samples are brought in contact with the array, and are hybridized. Confocal fluorescence microscopy is used to optically determine the relative fraction of RNA from each cell and for each spot (gene). If the gene over-expressed the spot will look red, green if it under-expressed and yellow if it was equally expressed. A microarray usually contains many thousands of sample spots. The spot size n ranges from 25 500 µm, depending on the type of microarray.



Clustering the Gene Expression Profiles

The gene expression profiles from the different patients is put in a 2D array where each column is one of the profiles, and that array is 2D-hierarchical clustered. In the picture you can see a 327 gene expression profiles (columns) with 271 gene probes each (rows). The white ones are the over-expressed, the gray ones the under-expressed and the black ones the normal expressed.



Building the Model of the Gene Regulatory Network

If we use this 2D array as input to the GeneSim, it will derive a network structure that represents the dependences between the different genes. A fragment of one of those graphs is in the figure below. From that network it can be generated an artificial generated matrix that will be quite similar to the original one.

We could think that important nodes (genes) are the ones that have more connections, but it turns out that the most important ones are the ones that have more load.



Once the network is constructed, any specific gene can be activated (as if due to the effect of a virtual drug) and see how it will affect to the different groups of patients. Therefore, it can simulate the effect of a drug in the patients showing the resulting 2D array.

In the figure below, simulation of how the activation of different genes affects the expression profiles



Useful links

The slides of the seminar
List of some of the publications by Martin Stetter
Martin Stetter, Gustavo Deco, Mathäus Dejori: Large-Scale Computational Modeling of Genetic Regulatory Networks. Artif. Intell. Rev. 20(1-2): 75-93 (2003)
Martin Stetter, Bernd Schürmann, Mathäus Dejori: Systems Level Modeling of Gene Regulatory Networks

Last Update: