Large
scale modeling of gene regulatory networks
Dr. Martin Stetter
Siemens AG, Corporate Research
Munich, Germany
The perhaps most important signaling network in living cells is constituted by the interactions of proteins with the genome -- the gene regulatory network of the cell. From a system-level point of view, the various interactions and control loops, which form a genetic network, represent the basis upon which the vast complexity and flexibility of life processes emerges. Sometimes, local interventions on one or a few genes -such as mutations or drug effects- can evoke a dramatic change of the global network operation often resulting in systemic diseases such as cancer.
Here we provide a review over some efforts towards gaining a quantitative understanding of regulatory genetic networks by means of large scale computational models. After a brief description of the biological principles of gene regulation and the novel data basis provided by genome wide expression measurements by DNA microarrays, I will summarize a set of recent data driven approaches for modeling gene regulatory networks based on gene expression data. Examples based on network models will include clustering, Bayesian Networks, decomposable graphical models and generative inverse modeling. They will help identifying both structural (scale-free property) and functional (local-global relationships) features of genetic networks, which in turn can help to disentangle gene functions, disease mechanisms and drug effects.
Seminar outline compiled by Manel Guerrero:
Author's short bio
Martin Stetter works as principal research scientist at the Neural Computation Department of Siemens Corporate Technology, Munich, Germany, where he heads the "Bio-analogue Technologies and Solutions" research team. He received a Ph.D. in Biophysics from the University of Regensburg (Germany) in 1994 based on neuronal modeling of developmental processes in early mammalian vision. From 1992 to 1996 he headed the computing group at the University Eye Hospital of Regensburg. A post-doctoral fellowship at the Dept. of Biophysics devoted to modeling early mammalian vision accompanied this position from 1994-1996. Between 1996 and 2000 he worked as an assistant professor in the field of computational neuroscience at the Dept. of Computer Science of the Technical University of Berlin (Germany), where he received a habilitation (german post-doctoral academic degree) in 2001. Since 2000 he is with the Neural Computation Department, where he works on Computational Neuroscience of vision, and started to establish bioinformatics and systems biology research. Since 2002, he is also faculty member of the Technical University of Munich as an associate professor for computer science. He has published about 40 journal and conference papers on computational neuroscience, statistical modeling, functional image analysis and bioinformatics, as well as a book and several review articles.
Introduction
A cell has to be able to do a lot of different functions during its lifetime.
This things include: trans-membrane transport (eating), metabolism (digesting),
proliferation (growing), division (reproduction), specific functions specific
to its kind (work) like secretion, cellular signaling (to communicate with the
others), migration (to move), apoptosis (to die), etc.
All the information about how to do it is contained in the DNA (which is has
almost the same information in all the cells of a body). The question is, how
is this done? A first hypothesis could be the so called "One-gene-one-protein".
That means, that each gene is used to generate a protein, and then that protein
performs (or it's involved in) certain action. That hypothesis cannot be valid
because the human genome (the set of all the genes in a human cell) is composed
by about 30.000 genes and it's almost the same for most cells. But the proteome
(the set of proteins that are in a cell) is a subset of about 1.000.000
proteins which differs between cells that perform different functions.

The cell produces proteins following the 'instructions' that are in its genes.
This process is called expression. It starts by obtaining the mRNA from the DNA
(which is called transcription). Then, it continues by removing the introns and
leaving only the exons (that's called splicing). And finally, by a process
called translation the protein is obtained. This protein should go through a
process called PTM (folding) before becoming an active protein.
Actually, some of those active proteins are the ones who actually do the
splicing and the folding. And, some others affect back to the genes (in a
process called regulation). They don't necessary have to affect to the genes
from which they were expressed. Therefore, the influences from expression and
regulation conform a complex recurrent non-linear system. It is also needed to
say that genes and proteins are also affected by external signals.

Levels of Cellular Modeling
When trying to model a cell (with it expression and regulation mechanisms)
there are different approaches. See the table below.
There are some projects that are building biochemical models (like Virtual Cell
and E-CELL) and some others that are building dynamical models (like GNA and
Gen-O-Matic). But there was not much work in trying to build a Statistical
model. That is where GeneSim (TM) focuses. Genesim is being developed by
Stetter's team at Siemens.
| Levels of Cellular Modeling | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Statistical | Data Mining Approach |
|
|
||||||
| Graphical Models | |||||||||
| Bayesian Networks | |||||||||
| Dynamical | Boolean Networks | ||||||||
| Neuronal Networks | |||||||||
| Differential Equations | |||||||||
| Biochemical | Continuous Reaction Kinetics | ||||||||
| Reaction-Diffusion Models | |||||||||
| Stochastic Reaction Networks | |||||||||
Obtaining the Gene Expression Profiles
To use GeneSim, first we need to obtain the differential gene expression
profiles from different patients' DNAs with DNA microarrays. Each spot of a
microarray contains single-stranded nucleotide sequences as probes, which are
complementary to a sequence of one gene. Differently labeled RNA molecules
(light and dark gray) from two samples are brought in contact with the array,
and are hybridized. Confocal fluorescence microscopy is used to optically
determine the relative fraction of RNA from each cell and for each spot (gene).
If the gene over-expressed the spot will look red, green if it under-expressed
and yellow if it was equally expressed. A microarray usually contains many
thousands of sample spots. The spot size n ranges from 25 500 µm, depending on
the type of microarray.

Clustering the Gene Expression Profiles
The gene expression profiles from the different patients is put in a 2D array
where each column is one of the profiles, and that array is 2D-hierarchical
clustered. In the picture you can see a 327 gene expression profiles (columns)
with 271 gene probes each (rows). The white ones are the over-expressed, the
gray ones the under-expressed and the black ones the normal expressed.

Building the Model of the Gene Regulatory Network
If we use this 2D array as input to the GeneSim, it will derive a network
structure that represents the dependences between the different genes. A
fragment of one of those graphs is in the figure below. From that network it
can be generated an artificial generated matrix that will be quite similar to
the original one.
We could think that important nodes (genes) are the ones that have more
connections, but it turns out that the most important ones are the ones that
have more load.

Once the network is constructed, any specific gene can be activated (as if due
to the effect of a virtual drug) and see how it will affect to the different
groups of patients. Therefore, it can simulate the effect of a drug in the
patients showing the resulting 2D array.
In the figure below, simulation of how the activation of different genes
affects the expression profiles

Useful links
The slides of the seminar
List of some of the publications by Martin Stetter
Martin Stetter, Gustavo Deco, Mathäus Dejori: Large-Scale Computational Modeling of Genetic Regulatory Networks. Artif. Intell. Rev. 20(1-2): 75-93 (2003)
Martin Stetter, Bernd Schürmann, Mathäus Dejori: Systems Level Modeling of Gene Regulatory Networks
Last
Update: