SysMod ISMB 2017 Abstracts
This page lists all the presentations selected for the 2017 SysMod meeting, taking place in Prague on 22nd July as part of ISMB/ECCBB.
- 1 Keynote presentations
- 2 Oral presentations with article in Proceedings
- 2.1 Estimation of time-varying growth, uptake and excretion rates from dynamic metabolomics data
- 2.2 popFBA: tackling intratumour heterogeneity with Flux Balance Analysis
- 2.3 A scalable moment-closure approximation for large-scale biochemical reaction networks
- 2.4 Efficient Simulation of Intrinsic, Extrinsic and External Noise in Biochemical Systems
- 3 Oral presentations selected from Abstracts
- 3.1 Explanation of drug effects using a mechanistic model automatically assembled from natural language, databases, and literature
- 3.2 Who's driving? Cell cycle robustness investigated by a multi-scale framework integrating cell cycle and metabolism in budding yeast
- 3.3 Reversed dynamics to uncover basins of attraction of asynchronous logical models
- 3.4 Manatee invariants for functional analysis of signaling pathways in complex networks
- 3.5 Large Scale Mechanistic Modeling Enables Robust Prediction of Cancer Cell Drug Response
- 3.6 Incorporating patient-specific molecular data into a logic model of prostate cancer
- 3.7 Tracking and Engineering the Evolution of Organismal Fitness via Multi-Organism mRNA Translation Whole Cell Simulations
- 4 Poster presentations selected from submitted Abstracts
- 4.1 Modeling Patient Response to Hemodialysis Using System Identification Methods
- 4.2 Logical modeling approaches as a proxy to analyze cardiovascular disease trajectories
- 4.3 Pioneering topological methods for network-based drug-target prediction by exploiting a brain-network self-organization theory
- 4.4 Predictive Modelling of Behavior through Gene Expression over Multiple Generations
- 4.5 Context Dependent Prediction of Protein Complexes
- 4.6 Siderophore-drug conjugates and resistance to bacterial pathogen
- 4.7 The JSBML project: a fully featured Java API for working with systems biological models
- 4.8 A Bayesian Approach for Estimating Hidden Variables as well as Missing and Wrong Molecular Interactions in ODE Based Mathematical Models
- 4.9 Modeling the metabolic interactions of an algal-bacterial mutualism enhancing lipid production for biofuels
- 4.10 A factor analysis based method for characterizing the covariance structure of related datasets
- 4.11 Tuning the robust control of cell cycle transitions by twist of kinases and phosphatases
- 4.12 Feature Engineering in Prediction of Human Malaria Resistance from Systems approaches using Deep Learning
- 4.13 Additive Dose Response Models: Explicit Formulations and the Loewe Additivity Consistency Condition
- 4.14 Inference of tumor microenvironment interactions as a new therapeutic strategy to treat TNBC
- 4.15 Comprehensive multi-scale representation of disease mechanisms: the AsthmaMap example
- 4.16 System-level High-Dimensional Multi-Objective Analyses of Metabolic Tradeoffs in Biological Systems
- 4.17 Quantitative methods for detecting origins of interferons signalling sensitivity
- 4.18 Pancancer modelling predicts laurethe context-specific impact of somatic mutations on transcriptional programs
- 4.19 A Genome-Scale Metabolic Reconstruction of Eubacterium limosum KIST612
- 4.20 Predictive Virtual Infection Modeling of Candida albicans Immune Escape in Human Blood
- 4.21 Stochastic Modeling of Gene Regulatory Networks in Escherichia coli
- 4.22 Adaptative response of fission yeast metabolism to natural genetic variation
- 4.23 Classifying oncogenes and tumor suppressors in fusion protein-protein interaction networks using a community-based Naïve Bayes approach
- 4.24 Logic modeling in quantitative systems pharmacology
- 4.25 A Quantitative Model for the Rate-Limiting Process of UGA Alternative Assignments to Stop and Selenocysteine Codons
- 4.26 The challenge of integrating multi-omic multi-factorial data to infer regulatory networks
- 5 Posters selected for poster submissions
- 5.1 BTR: training asynchronous Boolean models using single-cell expression data
- 5.2 Conceptual and computational framework for logical modelling of biological networks deregulated in diseases
- 5.3 A method for time dependent pathway networks
- 5.4 Representations of Markov processes in biological optimization problems
- 5.5 Inferring network statistics from high-dimensional time-course data
- 5.6 A systems biology approach to investigate cell fate switches in intestinal organoids
- 5.7 Hybrid multivariate modelling of drug response in human cancer cell lines
- 5.8 Structural equation modeling with latent variables of genomic information for multiple diseases
- 5.9 The problem of recombination suppression in evolution of sex chromosomes
- 5.10 Fast biological network reconstruction from high-dimensional time-course perturbation data using sparse multivariate Gaussian processes
- 5.11 Mathematical modelling of promoter occupancies in MYC-dependent gene regulation
- 5.12 E-cyanobacterium.org: A Web-based Platform for Systems Biology of Cyanobacteria
- 5.13 PITHYA: High-Performance Parameter Synthesis for Biological Models
- 5.14 Transcriptomics Driven Lipidomics (TDL) identifies the microbiome-regulated targets of ileal lipid metabolism
- 5.15 Modelling Transcriptomics Data of the Developing Enteric Nervous System
- 5.16 Comparing Ordinary Differential Equation and Rule Based Models of DARPP-32 signalling
- 5.17 FAIRDOM: supporting FAIR data and model management
- 5.18 The CD4+ T cell regulatory network mediates inflammatory responses during acute hyperinsulinemia: a simulation study
- 5.19 Prediction of stem cell pluripotency using parallel single-cell transcriptome and methylome sequencing data
- 5.20 ASSA‐PBN: a software tool for large probabilistic Boolean networks
- 5.21 A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-treated Atlantic Cod (Gadus morhua) Liver
- 5.22 Parameter fitting facility for rule-based models helps to analyse clathrin polimerization mechanisms
- 5.23 Sparse regression modeling of drug response with a localized estimation framework
- 5.24 Agent-based Modeling of Hyaluronan in Vocal Fold Wound Repair
- 5.25 Boolean Modeling of Breast Cancer Cell Lines using Logic Programming
- 5.26 Formal modelling of multi-agent interaction dynamics in biological systems
- 5.27 An FPGA-Based framework for simulation and analysis of Boolean gene regulatory networks
- 5.28 Accelerating construction of integrative kinetic models of metabolism with GRaPe
- 5.29 Spatial constraint and the interaction between lytic phages and biofilm-dwelling bacteria
- 5.30 Fast, easy, interoperable and reusable - the cobrapy infrastructure for modeling metabolic flux
- 5.31 Identifying novel enzymes for metabolic engineering using homology based sequence search
- 5.32 Deep learning models of bifurcating developmental journeys from single-cell transcriptomic data can make accurate predictions of gene perturbations
- 5.33 Sparse Regression for Network Graphs and Its Application to Gene Networks of The Brain
Modeling in uncertain times
University of Manchester, UK
Quantitative predictive modeling of biological systems depends on the availability of reliable data, ranging from network topology to kinetic parameters. In many application scenarios, this information is either not available or is incomplete and noisy. Ensemble modeling approaches allow informative model building in the face of this uncertainty, enabling the rigorous assessment of the confidence associated with individual model predictions. Even where copious parameter information is available, ensemble modeling can avoid some of the pitfalls of traditional approaches based on maximum-likelihood parameter fitting. We have recently argued that the methods of ensemble modeling are an essential component of a comprehensive new framework for model construction, analysis and management, referred to as ‘respectful modeling’ . In this talk, I will discuss various challenges of implementing an ensemble modeling strategy for molecular biology, from the quantitative capture of parameter uncertainty to methods for ensuring the thermodynamic consistency of ensemble models.
 Tsigkinopoulou, Baker & Breitling (2017) Respectful modeling: addressing uncertainty in dynamic system models for molecular biology. Trends Biotechnol. 35(6):518–529. http://www.cell.com/trends/biotechnology/abstract/S0167-7799(16)30231-1
Computational modeling of promoter occupancies in MYC-dependent gene regulation
Max Delbrück center for molecular medicine, Berlin, DE
MYC is a proto-oncogenic transcription factor with an enhanced expression in the majority of tumours. MYC target genes show cell type specific expression patterns despite genome wide DNA binding of MYC. As an explanation two seemingly conflicting hypotheses have been proposed: one hypothesis proposes that MYC enhances expression of all genes, while the other suggests gene-specific regulation. We have explored the hypothesis that specific gene expression profiles arise since target gene promoters differ in their affinity for MYC. We quantified cellular expression levels of MYC using Western blotting and FACS. We used RNA- and ChIP-sequencing to correlate the occupancy of target gene promoters by MYC with gene expression at different concentrations of MYC. Our mathematical modelling approach demonstrated that binding affinities for interactions of MYC with DNA and with core promoter-bound factors are sufficient to explain promoter occupancies observed in vivo. Our work can explain why tumour-specific expression levels of MYC induce specific gene expression programs that alter cellular behaviour. The comprehensive analysis of our mathematical model indicates that our mechanistic insights are valid for many human cell types.
 Lorenzin F., Benary U., Baluapuri A., Walz S., Jung L.A., von Eyss B., Kisker C., Wolf J., Eilers M., Wolf E. (2016), Different promoter affinities account for specificity in MYC-dependent gene regulation, eLife 5: e15161. Benary U., Wolf E., Wolf J. (2017), Mathematical modelling of promoter occupancies in MYC-dependent gene regulation, Genomics and Computational Biology 3 (2): e54. https://elifesciences.org/articles/15161
Oral presentations with article in Proceedings
The articles accepted in the Proceedings of ISMB/ECCB 2017 will be published in OUP Bioinformatics.
Estimation of time-varying growth, uptake and excretion rates from dynamic metabolomics data
Eugenio Cinquemani1, Valérie Laroute2, Muriel Cocaign-Bousquet3, Hidde de Jong1, and Delphine Ropers1
1INRIA Grenoble - Rhone-Alpes, FR, 2Universite de Toulouse, FR, 3INRA, FR
Motivation: Technological advances in metabolomics have made it possible to monitor the concentration of extracellular metabolites over time. From these data it is possible to compute the rates of uptake and excretion of the metabolites by a growing cell population, providing precious information on the functioning of intracellular metabolism. The computation of the rate of these exchange reactions, however, is difficult to achieve in practice for a number of reasons, notably noisy measurements, correlations between the concentration profiles of the different extracellular metabolites, and discontinuities in the profiles due to sudden changes in metabolic regime.
Results: We present a method for precisely estimating time-varying uptake and excretion rates from time-series measurements of extracellular metabolite concentrations, specifically addressing all of the above issues. The estimation problem is formulated in a regularized Bayesian framework and solved by a combination of extended Kalman filtering and smoothing. The method is shown to improve upon methods based on spline smoothing of the data. Moreover, when applied to two actual datasets, the method recovers known features of overflow metabolism in E. coli and L. lactis, and provides evidence for acetate uptake by L. lactis after glucose exhaustion. The results raise interesting perspectives for further work on rate estimation from measurements of intracellular metabolites.
popFBA: tackling intratumour heterogeneity with Flux Balance Analysis
Chiara Damiani, Marzia Di Filippo, Dario Pescini, Davide Maspero, Riccardo Colombo and Giancarlo MauriValerie Laroute, Muriel Cocaign-Bousquet, Hidde de Jong and Delphine Ropers
University of Milano-Bicocca, IT
Motivation: Intratumour heterogeneity poses many challenges to the treatment of cancer. Unfortunately, the transcriptional and metabolic information retrieved by currently available computational and experimental techniques portrays the average behaviour of intermixed and heterogeneous cell subpopulations within a given tumour. Emerging single-cell genomic analyses are nonetheless unable to characterise the interactions among cancer subpopulations. In this work, we propose popFBA, an extension to classic Flux Balance Analysis (FBA), to explore how metabolic heterogeneity and cooperation phenomena affect the overall growth of cancer cell populations.
Results: We show how clones of a metabolic network of human central carbon metabolism, sharing the same stoichiometry and capacity constraints, may follow several different metabolic paths and cooperate to maximise the growth of the total population. We also introduce a method to explore the space of possible interactions, given some constraints on plasma supply of nutrients. We illustrate how alternative nutrients in plasma supply and/or a dishomogeneous distribution of oxygen provision may affect the landscape of heterogeneous phenotypes. We finally provide a technique to identify the most proliferative cells within the heterogeneous population.
Availability: the popFBA MATLAB function and the SBML model are available at https://github.com/BIMIB- DISCo/popFBA
A scalable moment-closure approximation for large-scale biochemical reaction networks
Motivation: Stochastic molecular processes are a leading cause of cell-to-cell variability. Their dynamics are often described by continuous-time discrete-state Markov chains and simulated using stochastic simulation algorithms. As these stochastic simulations are computationally demanding, ordinary differential equation models for the dynamics of the statistical moments have been developed. The number of state variables of these approximating models, however, grows at least quadratically with the number of biochemical species. This limits their application to small- and medium-sized processes.
Results: In this manuscript, we present a scalable moment-closure approximation (sMA) for the simulation of statistical moments of large-scale stochastic processes. The sMA exploits the structure of the biochemical reaction network to reduce the covariance matrix. We prove that sMA yields approximating models whose number of state variables depends predominantly on local properties, i.e. the average node degree of the reaction network, instead of the overall network size. The resulting complexity reduction is assessed by studying a range of medium- and large-scale biochemical reaction networks. To evaluate the approximation accuracy and the improvement in computational efficiency, we study models for JAK2/STAT5 signalling and NFkB signalling. Our method is applicable to generic biochemical reaction networks and we provide an implementation, including an SBML interface, which renders the sMA easily accessible.
Availability: The sMA is implemented in the open-source MATLAB toolbox CERENA and is available from https://github.com/CERENADevelopers/CERENA.
Efficient Simulation of Intrinsic, Extrinsic and External Noise in Biochemical Systems
Dennis Pischel, Kai Sundmacher and Robert J Flassig
Max Planck Institute for Dynamics of Complex Technical Systems, DE
Motivation: Biological cells operate in a noisy regime influenced by intrinsic, extrinsic, and external noise, which leads to large differences of individual cell states. Stochastic effects must be taken into account to accurately characterize biochemical kinetics. Since the exact solution of the chemical master equation, which governs the underlying stochastic process, cannot be derived for most biochemical systems, approximate methods are used to obtain a solution.
Results: In this study a method to efficiently simulate the various sources of noise simultaneously is proposed and benchmarked on several examples. The method relies on the combination of the sigma point approach to describe extrinsic and external variability and the τ-leaping algorithm to account for the stochasticity due to probabilistic reactions. The comparison of our method to extensive Monte Carlo calculations demonstrates an immense computational advantage while losing an acceptable amount of accuracy. Additionally the application to parameter optimization problems in stochastic biochemical reaction networks is shown, which is rarely applied due to its huge computational burden. To give further insight a MATLAB® script is provided including the proposed method applied to a simple toy example of gene expression.
Oral presentations selected from Abstracts
Explanation of drug effects using a mechanistic model automatically assembled from natural language, databases, and literature
The capacity of modern experimental methods to generate data about biological processes has surpassed the ability of existing informatics approaches to generate meaningful mechanistic explanations. Mechanistic systems biology models could potentially address this gap, but model construction remains a labor-intensive process requiring both biological knowledge and modeling expertise. As a result, modeling studies remain fairly small in scope and are disconnected from genome-scale research.
To address this problem we have developed the Integrated Network and Dynamical Reasoning Assembler (INDRA), a system that automatically assembles mechanistic models from pathway databases, literature, and expert knowledge expressed in natural language. INDRA draws on three existing natural language processing systems and uses a modular architecture to build different types of models from a variety of sources. Mechanisms are extracted from each source format and converted into Statements, a normalized representation of biological mechanisms. To identify redundancies and overlaps, Statements are sorted into a hierarchy graph that identifies which are generic (e.g., “MEK phosphorylates ERK”) and which are more specific (e.g., “MEK1 phosphorylated at serine 218 and S222 phosphorylates ERK1 at threonine 185”). The reliability of each mechanism is scored probabilistically based on the sources and frequency of extractions. Manual evaluation of the system on a corpus of papers indicated that this assembly process can reliably extract previously uncurated protein-protein interactions and post-translational modifications, and eliminates many of the errors that arise in automated model construction. A key feature of this approach is that the assembled models are not only broad in scope but also mechanistic, capturing information about sites of post-translational modification and necessary molecular context.
To evaluate the ability of INDRA to systematically generate explanations of high-throughput data, we assembled a rule-based executable model to explain a previously published dataset of the phospho-proteomic response of a melanoma cell line to 12 different drugs. Static analysis of the rule influence map provided by Kappa allowed us to identify possible mechanistic paths linking drug targets to experimentally observed effects on phospho-protein abundances. The model generated biochemically plausible explanations for 12 of the 21 largest effects in the data (57%). Notably, manual inspection revealed that several of the unexplained effects were due to a feedback mechanism from MTOR to AKT that was evidently missing from the model. We “patched” the model by describing, in natural language, a previously published negative feedback via IGF1R and IRS1. The combined curated/automated model then generated explanations for 18 of the 21 largest effects (81%). An evaluation on weaker effects showed that while the automated model explained 67 of 141 (48%) of effects overall, performance was biased towards drug targets that were well-represented in the corpus of publications used for model assembly: the model explained 74% of effects due to RAF, MEK, PKC, AKT and PI3K inhibition, but only 3% of effects due to JAK, STAT or CDK inhibition. Taken together, this study shows the potential of automatically assembled models to systematically explain high-throughput data, generating mechanistic hypotheses and identifying genuinely novel phenomena.
Who's driving? Cell cycle robustness investigated by a multi-scale framework integrating cell cycle and metabolism in budding yeast
Lucas van der Zee, Thierry D.G.A. Mondeel, Hans V. Westerhoff, and Matteo Barberis
Synthetic Systems Biology and Nuclear Organization, Swammerdam Institute for Life Sciences, Universiteit van Amsterdam, NL
Cell cycle and metabolism are coupled networks. Cell growth and division require synthesis of macromolecules which is dependent on metabolic cues. Conversely, metabolites involved in nucleotide and protein synthesis are fluctuating periodically as a function of cell cycle progression. To date, no effort has been made to integrate, and to investigate the mutual regulation of, these two systems in any organism.
Connections among cell cycle and metabolism have been recently elucidated in budding yeast. However, high-throughput and manually curated studies point at many more physical interactions between these two networks. Here we aim to investigate cell cycle robustness by generating the first multi-scale model that integrates cell cycle with metabolism, and investigating their bidirectional regulation.
Here a framework is presented integrating a Boolean cell cycle model with a constraint-based model of metabolism, incorporating mechanistic and high-throughput interactions on the bidirectional regulation. For the mechanistic interactions, directionality and effect are known. As this information is unknown for the high-throughput interactions, an informed optimization algorithm has been developed to generate models that can incorporate it iteratively. Through Boolean logic, activity of cell cycle nodes can activate or inhibit metabolic reactions. Conversely, presence or absence of a metabolic flux can promote or prevent the activity of cell cycle nodes, respectively.
The results of the informed optimization algorithm agnostic to the information regarding directionality and effect of interactions are verified against metabolomic data. Specifically, changes in flux through 15 metabolic pathways are compared to metabolic pathway enrichment time-series. The multi-scale model predicts expected changes in the majority of the pathways, spread over amino-acid, pentose phosphate, and lipid metabolism. Furthermore, many models that differ in number and directionality of interactions robustly predict a definite set of interactions underlying the bidirectional regulation between cell cycle and metabolism. Furthermore, the integrative model shows a temporal export of acetate, pyruvate and alanine, qualitatively reminiscent of yeast metabolic oscillations. Altogether, our multi-scale framework is able to integrate computer models of biological networks with high-throughput data, to capture the functional connectivity among their elements that ultimately results in systems robustness.
Reversed dynamics to uncover basins of attraction of asynchronous logical models
Sébastien Fueyo1, Pedro T. Monteiro2, Aurélien Naldi3, Julien Dorier4, Élisabeth Remy5, and Claudine Chaouiya6
1Instituto Gulbenkian de Ciência / INRIA Sophia Antipolis, 2INESC-ID/IST, Universidade de Lisboa, 3IBENS (CNRS UMR 8197 - INSERM U1024), 4Vital-IT, Systems biology and medicine department, SIB Swiss Institute of Bioinformatics, 5Aix Marseille Université, CNRS, Centrale Marseille, I2M, UMR 7373, 6Instituto Gulbenkian de Ciência, PT
Logical (Boolean or multivalued) models permit to retrieve salient dynamical properties of regulatory and signalling networks. Model attractors are of particular relevance because they account for long term behaviours, often associated to cellular fates. A variety of methods have been developed to identify those attractors (stable states or cyclical attractors) and/or to check their reachability from specific initial conditions. However, the problem of identifying their basins of attraction (sets of states leading to specific attractors) has not received much attention; it has been mostly addressed by exhaustive and demanding searches.
Here, we focus on asynchronous logical models that are non deterministic and considered biologically more realistic than the synchronous, deterministic models. Evidently, exhaustive searches in asynchronous dynamics are even less tractable for large networks than in the synchronous setting and different attractors may be reachable from the same state, following concurrent trajectories. This leads to the notion of weak and strong basins of attraction, the later being the set of states that inevitably lead to a given attractor. We also define the interior and exterior boundaries of strong basins, which would correspond to a notion of separatrix adapted to the discrete framework.
The goal of our work is to facilitate the identification of the basins of attraction in the context of logical, asynchronous models. To do so, we propose to explore in the reverse dynamics, the reachable state space from each model attractor. We thus start by formally define the reverse of a logical model as the model producing the reverse asynchronous dynamics. We establish remarkable properties of reverse models and contrast the Boolean and multi-valued cases. We demonstrate the use of model reversal to identify basins of attraction and their boundaries on a Boolean model of cell-fate decision upon death receptor activation (Calzone et al PLOS Computational Biology, 2010). Model reversal was implemented in the modelling software GINsim, and we relied on boolSim for an efficient reachability analysis. We further provide a set of scripts for reproducing the analyses presented for the aforementioned case studies.
Manatee invariants for functional analysis of signaling pathways in complex networks
Leonie Amstein, Jennifer Scheidel, Jörg Ackermann, Simone Fulda, Ina Koch
Johann Wolfgang Goethe-University Frankfurt (Main), DE
Motivation: Signaling systems like the TNFR1 (tumor necrosis factor receptor 1) pathway process environmental stimuli to mediate the adequate immune response ranging from cell death induction to activation of gene expression. The signaling system is highly intertwined and forms a complex regulatory network. Generally, the underlying biological data are incomplete, and the kinetic parameters are often unknown or hardly accessible. But still, mathematical modeling at semi-quantitative level may help to gain new insights of the system’s dynamic behavior. Mathematical network analysis allows for decomposition into smaller, biologically meaningful modules and supports to unravel system–wide processes. Automatic detection of complete pathways from the receptor to the cell response remains an issue for the analysis of signaling networks. Due to regulatory characteristics of signaling systems, like feedback loops, cross-talks or signal amplification, straightforward detection of all possible signal flows is often impaired.
Methods: We present an approach for the automatic enumeration of all signal transduction pathways from the receptor to the cellular response. The approach is based on elementary mode analysis expressed as transition invariants in the Petri net formalism [1, 2]. We introduce the concept of Manatee invariants to detect all complete signal flows from signal reception to cellular response. The concept of Manatee invariants is based on feasible transition invariants , which combine interrelated transition invariants in a specific way. We apply the concept to a small semi-quantitative Petri net model of the TNFR1-mediated NF-κB signaling pathway, using MonaLisa [4, 5]. In the network, especially processes of transcription factor, NF-κB, regulation exhibit feedback loops and amplifications, which led to transition invariants that did not cover all signal flows from the receptor to the response, but only parts of them. We demonstrate that Manatee invariants cover the whole signaling pathway without interruptions.
Results: The computation of Manatee invariants yielded the combinatorial diversity of possible signal flows. In silico knockout experiments based on Manatee invariants revealed biologically relevant results, since pathway dependencies were properly captured. Applying Manatee invariants, even for the knockout of NF-κB, correct relations to upstream and downstream signaling processes were detected. We demonstrate that Manatee invariants determine all complete signal flows from the receptor to the cellular response, thus, revealing essential pathway dependencies, which could be used, for example, for in silico knockout analyses .
- Reisig W (1985) Springer, Petri Nets: An Introduction. EATCS Monographs on Theoretical Computer Science, vol. 4.
- Koch I, Reisig W, Schreiber F. (Eds.) (2011) Springer, Modeling in Systems Biology: The Petri Net Approach
- Sackmann A, Heiner M, Koch I (2006) BMC Bioinformatics, 7:482 (2006), doi:10.1186/1471-2105-7-482
- Einloft J, Ackermann J, Nöthen J, Koch, I (2013) Bioinformatics, 29: 1469-1470
- Balazki P, Lindauer K, Einloft J, Ackermann J, Koch I (2015) BMC Bioinformatics, 16:215; doi:10.1186/s12859-015-0596-y
- Scheidel J, Amstein L, Ackermann J, Dikic I, Koch I (2016) PLoS Comp Biol, 12(12):e1005200, doi: 10.1371/journal.pcbi.1005200
Large Scale Mechanistic Modeling Enables Robust Prediction of Cancer Cell Drug Response
Fabian Fröhlich, Alexey Shadrin, Thomas Kessler, Christoph Wierling, Bodo Lange, Fabian Theis, and Hans Lehrach
Helmholtz Zentrum München, DE
Background: Large-scale studies like The Cancer Genome Atlas (TCGA) revealed that cancers are multi- factorial diseases, which strongly vary between patients. This inter-patient variability poses a challenge for clinicians. A priori it is not clear which drug or drug combination will be most beneficial for an individual.
Methods: We approach the problem of drug response prediction using a systems biological approach. We developed a generic large-scale mechanistic dynamic model covering multiple cancer associate signaling pathways. This ordinary differential equation model can be individualized using exome and transcriptome sequencing data – carrying information about mutation status and expression levels. For statistical inference of the model parameters we implemented adjoint sensitivity analysis methods. These methods facilitate the study of large-scale models with thousands of parameters and state variables.
Results: To evaluate the proposed model-based approach, we studied data response from the Cancer Cell Line Encyclopaedia (CCLE) for 7 drugs and 121 cell lines originating from five different tissues. We trained the model on 80% of the cell lines and predicted the response of the remaining 20%. On the test set we achieved a prediction accuracy of roughly 80%, outperforming all investigated statistical approaches. These results demonstrate the potential of large-scale mechanistic modeling for drug selection in personalized therapy.
Incorporating patient-specific molecular data into a logic model of prostate cancer
Prostate cancer is a leading cause of cancer death amongst men, but also prone to over-treatment. Based on public data (TCGA), we build a mathematical model of pathways frequently altered in prostate cancer tumors and simulate patient-specific outcomes. This model allows to improve our understanding of the mechanisms underlying this complex disease, and to suggest optimized and personalized strategies for therapeutic interventions.
We first propose a methodology for building a curated network from both prior knowledge and experimental data. The resulting signaling network is based on the most frequently altered genes and pathways in prostate cancer. Enrichment analyses of publicly available experimental data guide the inclusion of new molecular processes to complement the network. In particular, we use a gene set quantification tool (ROMA: Representation of Module Activity) to select pathways whose activity is significantly dispersed in prostate cancer samples. Detailed protein interactions relevant in these pathways are retrieved from the literature. This search is facilitated by pypath, a Python module that gives access to different signaling databases gathered in Omnipath.
A Boolean model is then derived from the network, with outputs characterizing several cancer-related phenotypes. Using MaBoSS, a probabilistic framework based on continuous time Markov chains, we estimate time evolution of phenotypic probabilities in specific contexts. This model provides a support to incorporate multi-omics patient-specific molecular data. Mutations and copy number variations of individual patients are encoded as perturbations of the model, while transcriptomics and proteomics data are discretized and compared with simulated probabilities. For that purpose, different methods of discretization of quantitative data have been explored. This leads to the definition of a set of model variants specific to tumor samples. The model is then validated by correlating the simulated phenotypical outputs for individual patients to clinical data, and is used to stratify patients.
The effect of different drugs on the model can be simulated and compared to experimental observations. One of the biggest advantages of the predictions made under this framework is that they are intrinsically accompanied by a mechanistic explanation. Ultimately, the objective is to identify disease mechanisms and novel therapeutic targets, as well as produce treatment recommendations for individual patients based on a therapeutic biomarker panel.
Tracking and Engineering the Evolution of Organismal Fitness via Multi-Organism mRNA Translation Whole Cell Simulations
Tamir Tuller and Hadas Zur
Tel Aviv University, IL
A single mammalian cell includes an order of 104-105 mRNA molecules and as many as 105-106 ribosomes. Large-scale simultaneous mRNA translation induces correlations between the mRNA molecules’ translation rates, as they all compete for the finite pool of available ribosomes. This has important implications for the cell's functioning and evolution. Developing a better understanding of the intricate correlations between these simultaneous processes, rather than focusing on the translation of a single isolated transcript, should help in gaining a better understanding of mRNA translation regulation, and the way elongation rates affect organismal fitness and the nucleotide composition of transcripts.
In this study, we report for the first time whole cell translation simulations of several organisms (e.g. S. cerevisiae, S. pombe, S. paradoxus), and use them to understand the genomic evolution of these organisms. The models consider all the biophysical aspects of translation (e.g. the size of the ribosomes, ribosome traffic jams, nominal initiation and elongation rates, excluded volume interactions, amongst others), and are based on measured parameters and novel estimations of nominal elongation and initiation rates from experimental data (e.g. Ribo-seq). We developed tools for comparing such a set of whole cell translation models, and for understanding the evolution of transcriptomes via directly connecting the genotype to the phenotypes (i.e. biophysical aspects of translation). The new analyses clearly enable deciphering novel aspects related to translation based constraints that could not be studied/detected with previous tools. Specifically, among others we show that in S. cerevisiae our model was able to explain 49% of the variability of the experimental (Ribo-seq) data, while elongation can explain 23% and initiation explains 26% of the variability (the results for other organisms are similar). In addition, we were able to demonstrate how mutations with decreased translation rate at the 5'end of coding regions improve organismal fitness. Furthermore, we showed that genes that relatively increase/decrease their expression in a certain evolutionary path also tend to increase/decrease the optimality of various transcript features, and thus their translation efficiency.
Finally, we also suggest a novel generic approach for improving the fitness of any host/organism by introducing silent/synonymous mutations based on our computational models. The algorithm introduces silent mutations that improve the allocation of resources (e.g. ribosomes or RNAP) in the cells via the elimination of traffic jams (e.g. during translation or transcription). As a result, more resources are available for the cell promoting improved fitness and growth-rate. For example, we show that by introducing silent mutations to 10/20 genes respectively we can increase the pool of available ribosomes by 6.6/8.7% respectively without effecting the translation rate of any gene. These increases in the ribosome pool should be similar/related to the increase in the organism growth rate and fitness. The approach can be used for improving the growth rate/fitness of any organism used in biotechnology and agriculture, thus enabling various biotechnological objectives such as heterologous gene expression, which maximizes protein production while maintaining the integrity of the host, or other/diverse biotechnological goals.
Poster presentations selected from submitted Abstracts
Modeling Patient Response to Hemodialysis Using System Identification Methods
Anca I. Stefan, Michelle M.Y. Wong, Gang Liu, John Hartman, Ronald L. Pisoni, Bruce M. Robinson, Victor P. Andreev
Arbor Research Collaborative for Health, USA; Visonex: Visonex Data Management, LLC, Green Bay, WI, USA.
Background: Observational studies have revealed a complex relationship between fluid overload, systolic blood pressure (SBP), and mortality in end-stage renal disease (ESRD) patients. Within the context of the human body as a biological system, we present a model of the biological subsystem that describes the regulation of SBP in response to the reduction of the patient’s fluid volume during the hemodialysis (HD) session. The purpose of this model is to determine a strategy for removing fluid that elicits a blood pressure response that does not impose additional stress and instability on the patient’s cardiovascular system.
Methods: The model was developed based on data from approximately 10,000 patients treated with HD over a 5-year period, provided by Visonex, LLC (Green Bay, WI). The data included patient demographics, outcomes (surviving, deceased, received transplant), in-session SBP recordings, and in-session rate of fluid removal (ultrafiltration rate, UFR) recordings. For prototyping our model, we selected contiguous session information (SBP and UFR) for a given patient, spanning over a period of 6 months, i.e., three sessions per week for 6 months. We modeled the effect of changes in the UFR on the patient’s blood pressure by estimating a single input single output (SISO) “black box” non-linear model (Systems Identification Toolbox, MATLAB). We chose to employ a nonlinear model because linear models, such as transfer function or autoregressive with exogenous input (ARX), do not provide a reasonable fit to the data for this particular type of problem. In the black box model, the changes in extracellular fluid volume, obtained by integrating UFR over time, represented the input signal, whereas the SBP fluctuations about the mean represented the output signal.
Results: An analysis of the black box models shows session-to-session and patient-to-patient variability. It also shows that the UFR profile can be modified to favorably influence intradialytic SBP fluctuations. Figure 1 shows a sample UFR profile (a) and SBP profile (b) that were used to identify and optimize the parameters of the model of the HD session. In Figure 1a, note that the UFR is not constant throughout the session; the values were adjusted by the HD nurse as an attempt to prevent declines in the patient’s SBP. Various UFR profiles were constructed and fed into the model until a desirable output was obtained (c, d), under the constraint that the same volume of fluid was removed. The new UFR profile (c) allows for lower amplitude fluctuations of SBP, thereby reducing circulatory stress.
Conclusions: This method has potential therapeutic applications in that it can be used to minimize acute and chronic cardiovascular complications of fluid overload by determining an optimal UFR profile based on the patient’s treatment history, including factors predicting switching from one type of SBP response to another.
Logical modeling approaches as a proxy to analyze cardiovascular disease trajectories
Amel Bekkar, Julien Dorier, Isaac Crespo, Anne Niknejad, Cristina Casal, Anne Estreicher, Alan Bridge, Ioannis Xenarios
SIB - Swiss Institute of Bioinformatics, CH
Cardio-vascular diseases (CVDs) are the leading cause of mortality and morbidity in Europe and worldwide. They are complex, multifactorial, chronic and comorbidities pathologies that cannot be described or explained by reductionist view. In order to tackle this complex diseases we have developed a logical modeling framework that consists of three efforts. The first step is composed of an expert curation related to the body of literature that we called Prior Knowledge Network (PKN). The PKN is assembled from the existing knowledge and experimental evidence to includes the relevant components for CVDs as well as the relationships between them (inhibition or activation). As compared to databases that register facts and summarize them, we have encoded the logical rules of regulations, enabling the use of the PKN for modeling and simulation. The second step simulate the cellular decision process and identify the phenotype attained by the regulatory networks. As the PKN is large (729 node, 3406 logical rules) a manual optimization would be time consuming and its simulation computationally demanding which requires the use of approaches such as Optimusqual, an optimization method that uses a genetic algorithm to find in the PKN the sub-graph that reproduces as well as possible a training set built from experimental data such as gene expression data. In the final step we simulate several in silico perturbations either known or unknown in the field of CVD. That allows us in one hand to evaluate and assess the pertinence of our model and in the other hand to make predictions and generate new testable hypotheses about driver nodes able to switch the network phenotype.
Pioneering topological methods for network-based drug-target prediction by exploiting a brain-network self-organization theory
Claudio Durán, Simone Daminelli, Josephine M. Thomas, V. Joachim Haupt, Michael Schroeder, Carlo Vittorio Cannistraci
Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden (TUD), Dresden, DE.
The bipartite network representation of the drug-target interactions (DTIs) in a biosystem enhances understanding of the drugs multifaceted action modes, suggests therapeutic switching for approved drugs and unveils possible side effects. Since experimental testing of DTIs is costly and time consuming, computational predictors are of great aid. Here, for the first time, state-of-the-art DTIs supervised predictors custom-made in network biology were compared - using standard and innovative validation frameworks - with unsupervised pure topological-based models designed for general-purpose link prediction in bipartite networks . Surprisingly, our results show that the bipartite topology alone, if adequately exploited by means of the recently proposed local-community-paradigm (LCP) theory  - initially detected in brain-network topological self-organization and afterward generalized to any complex network - is able to suggest highly reliable predictions, with comparable performance to the state-of-the-art supervised methods that exploit additional (nontopological, for instance biochemical) drug-target interaction knowledge. Furthermore, a detailed analysis of the novel predictions revealed that each class of methods prioritizes distinct true interactions, hence combining methodologies based on diverse principles represents a promising strategy to improve drug-target discovery. To conclude, this study promotes the power of bioinspired computing, demonstrating that simple unsupervised rules inspired by principles of topological self-organization and adaptiveness arising during learning in living intelligent systems (like the brain), can efficiently equal-perform complicated algorithms based on advanced, supervised and knowledge-based engineering.
- Simone Daminelli, Josephine Maria Thomas, Claudio Durán, and Carlo Vittorio Cannistraci. Common neighbours and the local-community-paradigm for topological link prediction in bipartite networks. New Journal of Physics, 17(11):113037, 2015.
- Carlo Vittorio Cannistraci, Gregorio Alanis-Lobato, and Timothy Ravasi. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci Rep, 3:1613, 2013.
Predictive Modelling of Behavior through Gene Expression over Multiple Generations
Andrea Constantinof, Vasilis G Moisiadis, Stephen G Matthews
University of Toronto, CA
Introduction: Prenatal exposure to excess glucocorticoids increases risk for neurodevelopmental disorders. We have demonstrated that antenatal synthetic glucocorticoids (sGC) program gene transcription in the prefrontal cortex (PFC) with strongest effects in females. Understanding the relationship between gene expression and phenotype provides greater insight into the molecular mechanisms which are being affected by prenatal sGC exposure, and the inheritance of these mechanisms. Here, we hypothesized that predictive modelling can be used to understand the relationship between gene expression and behavior.
Methods: Pregnant guinea pigs received 3 courses of betamethasone (Beta; 1mg/kg) or saline in late gestation. F1 and F2 male offspring were mated with non-experimental females to generate F2 and F3 offspring. Total locomotor activity in open-field (OFA) was measured in female offspring on postnatal day 24 and brains collected at day 40. The PFC was micro-punched from F1(C; n=4, Beta; n=4), F2(C; n=4, Beta; n=4), and F3(C; n=4, Beta; n=4) and submitted for RNA-seq. RNA-seq results were analyzed using standard bioinformatics. Hierarchal clustering was carried out on the OFA scores and normalized expression profiles of genes that were significantly differentially expressed across all 3 generations of animals. Multiple regression integrated gene expression profiles to predict OFA, and linear regression determined the correlation of predicted and observed OFA. The model was validated using leave-one-out cross validation, followed by the application of the model to a novel qRT-PCR data set. The model was trained on qRT-PCR data from samples that had been originally included in the RNA sequencing, and then tested on qRT-PCR data from additional samples (n=12).
Results: F2 animals cluster separately and distinctly from F1 and F3 animals. Multivariate linear regression demonstrated that OFA is significantly associated with expression of B3Galt2 (glycosyltransferase), Garnl3 (G-protein modulator), Nrip3 (transcription cofactor) & Gabra3 (GABA receptor) in F1 and F3 animal RNA-seq data (R2=0.72, p=0.0008). Cross validation revealed a significant correlation between predicted and observed OFA (R2=0.46, p=0.0003). The model replicated using the Cq values from qRT-PCR of the F1 and F3 in training data (samples originally included in the RNA-seq data) demonstrating a significant relationship between Cqs and behavior (R2=0.70, p=0.01). Lastly, the model generalized to the Cqs of test data, further demonstrating a significant correlation between gene expression and behavior in a novel data set (R2=0.38, p=0.03).
Conclusions: This is the first evidence of a correlation between stress-activated locomotor behavior and gene expression in the PFC following prenatal sGC over multiple generations. Interestingly, this association only occurred in F1 and F3 animals, as F2 animals displayed a distinct expression profile. These data provide insight into the heritable changes in gene expression affected by prenatal sGC exposure and how these expression changes may relate to behavior. Further, these data have identified key genes for future investigation as drug and intervention targets.
Context Dependent Prediction of Protein Complexes
Simone Rizzetto, Petros Moyseos, Bianca Baldacci, Corrado Priami, Attila Csikász-Nagy
Microsoft Research; University of Trento, IT; School of Medical Sciences, UNSW, AU; Stanford University, USA; King's College London, London, UK; Pázmány Péter Catholic University, Budapest, HU
Multiple copies of each protein are present in cells and some of these could be involved in multiple complexes, thus it is a challenging task to identify protein complex compositions and abundances of all possible complexes. We introduce an integrative simulation based computational approach that enables us to predict protein complexes together with their abundances from existing data sources on protein-protein and domain-domain interactions and protein abundances. The simulations show consistent protein complex compositions with manually curated data and can also predict the abundances of various alternative forms of the complexes . The updated version of the tool incorporates data on protein localization and tissue-specific protein abundances to improve and enable a wider range of predictions. The tool will be shortly available for the community to test various perturbations on the complexome. As an example we show how perturbations by drugs can influence the composition and abundance of protein complexes.
 Rizzetto S, Priami C, Csikász-Nagy A (2015) Qualitative and Quantitative Protein Complex Prediction Through Proteome-Wide Simulations. PLoS Comput Biol 11(10): e1004424
Siderophore-drug conjugates and resistance to bacterial pathogen
Kalyani Dhusia, Pramod W. Ramteke
Sam Higginbottom University of Agriculture, Technology and Sciences, IN
Comparative study of siderophore biosynthesis pathway in pathogens provides potential targets for antibiotics and host drug delivery as a part of computationally feasible microbial therapy. Iron acquisition using siderophore models is an essential and well established model in all microorganisms and microbial infections a known to cause great havoc to both plant and animal. Rapid development of antibiotic resistance in bacterial as well as fungal pathogens has drawn us at a verge where one has to get rid of the traditional way of obstructing pathogen using multiple or single antibiotics/chemical inhibitors or drug. Trojan horse strategy is an answer to this imperative call where antibiotic are by far sneaked into the pathogenic cell via the siderophore receptors at cell and outer membrane. This antibiotic once gets inside, generates a black hole scenario within the opportunistic pathogens via iron scarcity. For pathogens whose siderophore are not compatible to smuggle drug due to their complex conformation and stiff valence bonds, there is another approach. By means of the siderophore biosynthesis pathways, potential targets for inhibition of these siderophores in pathogenic bacteria could be achieved and thus control pathogenic virulence. Method to design artificial exogenous siderophores for pathogens that would compete and succeed the battle of intake are also covered with this review. These manipulated siderophore would enter pathogenic cell like any other siderophore but will not disperse iron due to which iron inadequacy and hence pathogens control be accomplished. The aim of this work is to offer strategies to overcome the microbial infections/pathogens using siderophore.
Keywords: Iron uptake; Siderophores; Antibiotic resistance;Trojan horse; Pathogen control
The JSBML project: a fully featured Java API for working with systems biological models
Nicolas Rodriguez , Thomas M. Hamm, Roman Schulte , Leandro Watanabe, Ibrahim Yusef Vazirabad, Victor Kofia , Chris J. Myers, Nicolas Le Novère, Michael Hucka, Andreas Dräger
The Babraham Institute, Cambridge, UK; Center for Bioinformatics Tübingen (ZBIT), Applied Bioinformatics Group, University of Tübingen, Tübingen, DE; Department of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT USA; Marquette University, Milwaukee, WI, USA; University of Toronto, Toronto, ON, CA; The California Institute of Technology, Pasadena, CA, USA
Background: SBML is the most widely used data format to encode and exchange models in systems biology. The open-source JSBML project has been launched in 2009 as an international collaboration with the aim to provide a feature-rich pure Java™ implementation for reading, manipulating and writing SBML files.
Results: The JSBML project has matured into a stable, actively developed, and well-documented software project with a large number of contributors around the world. A growing number of applications is now available that uses JSBML as their back-end for data manipulation. These cover diverse areas of use cases, such as model building and graphical display, constraint-based modeling, dynamic simulation, model annotation, and many more. JSBML supports all levels, versions, and releases of SBML and provides numerous utility functions that facilitate working with this standard. Thereby, JSBML integrates well with further Java libraries for community standards, such as SBGN or the COMBINE archive.
Discussion: The JSBML team actively maintains and updates the project. JSBML is being used in students’ education and numerous research projects. Major model databases, such as BioModels or BiGG Models, use JSBML-based tools for their curation pipelines. JSBML is also regularly subject of international students coding events.
Availability: Source code, binaries and documentation for JSBML can be freely obtained under the terms of the LGPL 2.1 from the website http://sbml.org/Software/JSBML/ and on GitHub https://github.com/sbmlteam/jsbml/. The users’ guide at http://sbml.org/Software/JSBML/docs/ provides further information about using JSBML.
- Dräger, A., Rodriguez, N., Dumousseau, M., Dörr, A., Wrzodek, C., Le Novère, N., Zell, A., and Hucka, M. (2011). JSBML: a flexible Java library for working with SBML. Bioinformatics, doi:10.1093/bioinformatics/btv341.
- Rodriguez, N., Thomas, A., Watanabe, L., Vazirabad, I. Y., Kofia, V., Gómez, H. F., Mittag, F., Matthes, J., Rudolph, J. D., Wrzodek, F., Netz, E., Diamantikos, A., Eichner, J., Keller, R., Wrzodek, C., Fröhlich, S., Lewis, N. E., Myers, C. J., Le Novère, N., Palsson, B. Ø., Hucka, M., and Dräger, A. (2015). JSBML 1.0: providing a smorgasbord of options to encode systems biology models. Bioinformatics, doi:10.1093/bioinformatics/btr361.
A Bayesian Approach for Estimating Hidden Variables as well as Missing and Wrong Molecular Interactions in ODE Based Mathematical Models
Engelhardt Benjamin, Kschischo Maik, Holger Fröhlich
UCB Biosciences GmbH, University of Bonn, DE
Ordinary differential equations (ODEs) are a popular approach to quantitatively model molecular networks based on biological knowledge. However, such knowledge is typically restricted. Wrongly modeled biological mechanisms as well as relevant external influence factors that are not included into the model likely manifest in major discrepancies beween model predictions and experimental data. Finding the exact reasons for such observed discrepancies can be quite challenging in practice. In order to address this issue we suggest a Bayesian approach to estimate hidden influences in ODE based models. The method can distinguish between exogenous and endogenous hidden influences. Thus, we can detect wrongly specied as well as missed molecular interactions in the model. We demonstrate the performance of our Bayesian Dynamic Elastic-Net with several ordinary differential equation models from the literature, such as human JAK-STAT signaling, information processing at the erythropoietin receptor, isomerization of liquid a-Pinene, G-protein cycling in yeast and UV-B triggered signaling in plants . Moreover, we investigate a set of commonly known network motifs and a gene-regulatory network. Altogether our method supports the modeler in an algorithmic manner to identify possible sources of errors in ODE based models on the basis of experimental data.
Modeling the metabolic interactions of an algal-bacterial mutualism enhancing lipid production for biofuels
Marc Griesemer, Miriam Windler, Jeff Kimbrel, Patrik D’haeseleer, Alfred Spormann, Xavier Mayali, Ali Navid
Lawrence Livermore National Laboratory, USA; Stanford University, Stanford, USA
Microbial organisms adapt and grow in a dynamic and diverse set of environments, all while interacting with other species. In fact, many organisms owe their versatile set of metabolic capabilities to the community interactions they have with others. We are investigating the metabolic interaction of the green algae Chlamydomonas reinhardtii with their associated bacteria to understand the effect on algal biofuel production. It has been observed that the rate of lipid production in a Chlamydomonas reinhardtii co-culture with an Arthrobacter strain is improved compared to an axenic algal culture, suggesting the presence of commensalism between these two species. However, the metabolic interactions in the co-culture are not well understood. The genome of the Arthrobacter strain has recently been sequenced, and we have annotated and reconstructed the first draft genome-scale model comprising of 1115 genes, 1539 reactions, and 1485 metabolites. We are coupling it with a well-curated GSM of C. reinhardtii in a community-based model to explore the influence of the bacteria on the algal lipid production. We seek to investigate whether metabolic exchanges alone are sufficient to explain the enhanced lipid production. Initial experimental results suggest that this algal-bacterial interaction does enhance lipid production, but we seek to quantify the extent of the effects through modeling using dynamic flux balance analysis (dFBA).
Teal Guidici, Charles Burant, N. Lynn Henry, George Michailidis
University of Michigan, USA; N. Lynn Henry, University of Utah; George Michailidis, University of Florida
Identifying and characterizing patterns of association between variables is a common aim in biology today. Studying these associations has played a crucial role in understanding a wide variety of biological phenomena, such as the dynamics of human disease, transcriptional changes associated with aging, or condition-specific alterations to metabolic pathways.
In a statistical framework, these associations between variables are commonly described in terms of precision matrices (which encode conditional associations) or covariance matrices (which capture marginal associations). In the latter case, most attention in the statistical literature has been placed on testing and estimation of only two covariance matrices, or two correlation matrices.
We present a factor analysis based method to model the structure of related covariance matrices. Our method captures both common structural elements across all conditions and condition specific differences in associations between variables. The approach is extendable to a wide range of datasets, including those that have more than two conditions, or have more complicated experimental design (such as having two experimental design factors, each with multiple levels). Additionally, our method allows us to use all available data to estimate the common structure and thereby gain the advantage of a larger sample size.
We will detail the theoretical framework for this model, briefly discuss simulation results and present an in depth example on a metabolomics dataset from breast cancer patients treated with aromatase inhibitors (AIs). Half of the patients in our dataset are unable to continue treatment with AIs for more than 6 months, due to side effects. These patients exhibit no mean differences in their metabolomic profiles from those who can continue treatment; however, subtle changes in the patterns of association between metabolites do exist between patient groups. Our method captures the similarities and differences in the patterns of association between metabolites across these patient groups, and provides a data driven way to group the metabolites into sets, thus allowing us to characterize the differences between conditions in a biologically meaningful way.
Tuning the robust control of cell cycle transitions by twist of kinases and phosphatases
Rosa Hernansaiz-Ballesteros, Luca Cardelli, Attila Csikász-Nagy, Pázmány Péter
King's College London, UK; Microsoft Research & University of Oxford, UK; Catholic University, Faculty of Information Technology and Bionics, Budapest, Hungary
Biological networks which control key processes work as all-or-none decisions making systems. In order to make a decision, many networks relay on kinases and phosphatases. One example is the cell cycle regulatory network. In interphase, most phosphatases of the regulatory network will be active, keeping kinases inactive by dephosphorylation; while during mitosis, most kinases will be active, keeping phosphatases inactive by phosphorylation. Thus, as a rule, kinases will be activated by phosphorylation, and phosphatases will be activated by dephosphorylation. However, there are two players in the core of this regulatory network that behave against the rule: Cdc25 and Wee1. Cdc25 is a phosphatase that is activated by phosphorylation, and Wee1 is a kinase that is activated by dephosphorylation. What are the advantages of having these main players regulated upside down? We investigate the impact of this difference in the core of the network over the dynamical behaviour of the transition.
Feature Engineering in Prediction of Human Malaria Resistance from Systems approaches using Deep Learning
Ali Mohamed Ali Kishk, Katrina A. Button-Simons
H3ABioNet Egyptian node, Faculty of Agriculture, Zagazig University, EG; Eck Institute for Global Health, Department of Biological Sciences, University of Notre Dame, IN
Introduction : Small data Systems Biology can show new insights, not usually revealed by systematic integration of the big data. Feature engineering involves filtering and selecting the most relevant features in Machine Learning and can be applied in any discipline. Feature Engineering can be classified generally to knowledge based and content based techniques. Knowledge based feature engineering needs an expert opinion on prioritizing features according to the problem. Genomics data suffers from high number of features per sample, usually hundreds of gene measures per sample which makes applying Machine Learning in genomic data a challenge. We will highlight some results on how integrating Systems approaches in the analysis of small scale datasets can help in features selection in Genomic data. Keywords: Feature Engineering, Malaria Resistance, Phylogenetic Profiling, Deep Learning
Methods: Two Systems approaches were used independently from the microarray data to find the most important features. The Systems Biology approach utilize two of 1.3 million gene lists from Library of Network-Based Cellular Signatures (LINCS) that represent the biggest drug perturbations database. This approach was followed by a Bioinformatics process called Phylogenetic Profiling to infer a list of orthologous gene from human to malaria. The Systems Chemistry approach (Pharmacophore search) tries to find possible proteins that might bind to a drug, thus might identify possible physical targets for Artemisinin. The resulting genes of the Systems Chemistry approach were followed by more downstream analysis by selecting genes found only in the Human Malaria PPI. Additional to the two Systems approaches, genes with high number of mutations in P . Falciparum in Southeast Asia populations were used. 370 in vivo transcriptomic profiles of Plasmodium Falciparum from Mok et al and corresponding artemisinin drug resistance data from South East Asia population were used for modelling after feature extraction.
Results: Genes identified by the two Systems approaches and highly variant genes were 237 in total, used as features for constructing a Deep neural network from Mok et al data. Five hidden layers neural network model was chosen with 0.605 R 2 for the cross validation and 0.458 R 2 for the test dataset. Knowledge-based Feature engineering through network analysis might solve the feature per sample challenge in applying Machine Learning in Genomic data. Also combining many Systems approaches might increase the power of the selected features.
References :  Mok, S., et al. "Population transcriptomics of human malaria parasites reveals the mechanism of artemisinin resistance." Science 347.6220 (2015): 431-435 To enable screen reader support, press shortcut Ctrl+Alt+Z. To learn about keyboard shortcuts, press shortcut Ctrl+slash.
Additive Dose Response Models: Explicit Formulations and the Loewe Additivity Consistency Condition
Simone Lederer, Tjeerd Dijkstra, Tom Heskes
Radboud University, NL; Max Planck Institute for Developmental Biology, DE
Introduction: High-throughput techniques allow for massive screening of drug combinations. To find compound combinations that exhibit an interaction effect, one filters for the most promising compound combinations by comparing to an expected response without interaction. The larger the deviance to such a null reference model, the larger the interaction effect. Over the past century, many null reference models have been introduced, compared, and often found to be insufficient [1, 2]. The Loewe Additivity model  is one of the few that survived the critiques.
Loewe Additivity: Loewe Additivity is based on the assumption that no compound should interact with itself. It is originally defined in form of the General Isobole (GI) equation, which is an implicit formulation of the response surface. For the model to be consistent, the individual compound responses have to be restricted. We call this restriction the Loewe Additivity Consistency Condition (LACC). This condition requires that the doses yielding the same effect have to be linearly related or, equivalently, that they are related by a shift on the log scale. Despite the potential inconsistencies, the model is very simple in its idea of yielding a fixed effect for a linear combination of both compounds, the so-called isoboles.
Contributions: We formally derive explicit and implicit formulations of the Loewe Additivity model. Moreover, we show that these formulations are equivalent given that the LACC holds, and are negligibly different otherwise. Both, the explicit formulation and LACC have not yet been studied in their own right. The LACC is violated in a significant number of cases. The choice of the GI model becomes therefore arbitrary. We show this by analyzing two datasets of drug screening that are supplied with a categorization into the three synergy cases: synergistic, non-interactive and antagonistic [4, 5]. On the non-interactive cases of both datasets, we conduct a mean-squared error analysis to the theoretical null reference models. We demonstrate that the explicit formulation of the null reference model leads to smaller errors than the implicit one. Further, we show that its computation time is significantly faster by a factor of 17.
Discussion:We are the first ones to our knowledge who provide a mathematical formulation of the Loewe Additivity model, develop a theoretical background and derive the model’s consistency condition. We show, based on the two data sets at hand, that this LACC is statistically significantly violated in practice.
- W. Greco, et al. The Search for Synergy: A Critical Review from A Response Surface Perspective. Pharmacol. Rev., 47(2):331–385, 1995.
- N. Geary. Understanding Synergy. Am. J. Physiol. Endocrinol. Metab., 304(3):E237–E253, 2013.
- S. Loewe. Die quantitativen Probleme der Pharmakologie. Ergebnisse der Physiol., 27(1):47–187, 1928.
- . Yadav, et al. Searching for Drug Synergy in Complex Dose-Response Landscapes Using an Interaction Potency Model. Comput. Struct. Biotechnol. J., 13:504–513, 2015.
- M. Cokol, et al. Systematic exploration of synergistic drug pairs. Mol. Syst. Biol., 7(544), 2011.
Inference of tumor microenvironment interactions as a new therapeutic strategy to treat TNBC
Venkata SK Manem, George-Alexandru Adam, Tina Gruosso, Nicholas Bertos, Morag Park, Benjamin Haibe-Kains
Princess Margaret Cancer Center, University Health Network, CA; The Goodman Cancer Centre, McGill University, Montreal, Quebec, CA; University of Toronto, Toronto, CA
Background: Triple negative breast cancer, lacking targeted therapy, is the most aggressive subtype of breast cancer and a major cause of mortality in women. Compelling evidence from biological and clinical studies shows that tumor cells recruit and modify neighboring cells from the tumor microenvironment (TME), or stroma1. This positions the TME as an important therapeutic target with a key role in determining disease outcome. We aim to identify the key players that participate in TME-tumor crosstalk and find actionable targets in the tumor epi-stroma interaction network.
Methods: We built a genome-wide network by estimating pairwise co-expression interactions between laser-microdissected matched tumor epithelium and tumor stroma profiles (38 samples each) along with normal epithelium and stroma profiles (34 samples each). A statistical framework was developed to compare the large-scale co-expression networks inferred from tumor and normal tissues. A novel community detection algorithm was then applied to extract communities and to identify actionable targets in the differential network. We next leveraged our large compendium of pharmacogenomics data, PharmacoGx2, to identify drugs that target these key interactions. In addition, we developed a user-friendly web-based application (CrosstalkNet) to efficiently visualize, mine and interpret these large co-expression networks representing the crosstalk occurring between the tumor and its microenvironment.
Results: We find self-loops - defined as genes coordinately expressed in epithelium and stroma - to be significantly enriched in the tumor and differential networks. In addition, we find immune and metabolic pathways to be co-expressed in epithelial and stroma compartments in the tumor network. Moreover, community detection in the tumor-specific network reveals key pathways related to cell cycle and immune to be significantly enriched. Using PharmacoGx, we find the compound fluphenazine to potentially inhibit cell cycle pathway in the tumor-specific interaction network.
Clinical Impact: Our computational platform, along with its web-application, will benefit the pharmacological and biomedical communities by providing new insights into the biology underlying the TME and new potential drug targets in TNBC. Upon successful preclinical validation, our drug candidates will provide novel therapeutic strategies in TNBC.
- Mueller MM, et al. Friends or foes — bipolar effects of the tumour stroma in cancer. Nat Rev Cancer. 2004;4: 839–849.
- Smirnov P, et al. PharmacoGx: an R package for analysis of large pharmacogenomic datasets. Bioinformatics. 2015;32: 1244–1246.
Comprehensive multi-scale representation of disease mechanisms: the AsthmaMap example
Alexander Mazein, Marek Ostaszewski, Inna Kuperstein, Mansoor Saqi, Bertrand De Meulder, Irina Balaur, Johann Pellet, Piotr Gawron, Stephan Gebel, Emmanuel Barillot, Andrei Zinovyev, Rudi Balling, Reinhard Schneider, Charles Auffray
European Institute for Systems Biology and Medicine, FR; Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, LU; Institut Curie, Paris, FR; INSERM, U900, Paris, FR; Mines ParisTech, Fontainebleau, FR
MOTIVATION Large amount of high-throughput data become available in the effort to better understand complex diseases. Despite the availability of various approaches, tools for disease-specific functional analysis are greatly underdeveloped. The direct approach to solve this problem is developing highly accurate comprehensive computerized representations of disease mechanisms on the level of cellular and molecular interactions (Fujita et al., 2013, PMID 23832570; Kuperstein et al., 2015, PMID 26192618; Mizuno et al., 2016, PMID 26849355).
METHODS Recent advances in systems biology made it possible to unambiguously represent biological processes in a consistent way in both human- and machine-readable standard format (Le Novère, 2015, PMID 25645874). Disease-specific representations are developed in CellDesigner (www.celldesigner.org) following the Systems Biology Graphical Notation (SBGN, www.sbgn.org). The involvement of domain experts from different groups ensures that different points of view are considered and all the disease hallmarks are covered and adequately represented.
RESULTS We present the concept of disease maps as a large-scale community effort. We describe the AsthmaMap example, a conceptual model of asthma mechanisms that includes 3 interconnected layers of granularity with 22 cell types and more than 2000 of proteins involved. The intermediate and detailed layers are developed correspondingly in SBGN Activity Flow and Process Description languages. This approach is designed to be used for data visualisation and interpretation on the level of signalling pathways and cell-to-cell communication. While being complementary to generic pathway enrichment tools (g-Profiler, Ingenuity Pathway Analysis and MetaCore), a disease map focuses on integrating information into a single flexible hierarchically-organised network, thus enabling advanced data analysis (e.g. via NaviCell, MINERVA) and making possible creating dynamic predictive computational models.
CONCLUSION To progress with this approach we propose building on the best practices and lessons learned from the previous projects and applying shared standards, tools and protocols to generating high-quality representations and enabling the exchange of reusable pathway modules (e.g. inflammation). We envision this strategy will facilitate powerful advances in systems medicine for understanding disease mechanisms, cross-disease comparison, finding disease comorbidities, suggesting drug repositioning, generating new hypotheses, and after careful validation, redefining disease ontologies based on their endotypes - confirmed molecular mechanisms.
ACKNOWLEDGEMENTS Funded by IMI (U-BIOPRED n°115010, eTRIKS n°115446).
System-level High-Dimensional Multi-Objective Analyses of Metabolic Tradeoffs in Biological Systems
Ali Navid, Marc Griesemer, Yongqin Jiao, Jennifer Pett-Ridge
Lawrence Livermore National Laboratory, USA
Evolutionary pressures such as competition for limited resources, safeguarding against predators, and adapting to environmental perturbations necessitate that living organisms balance tradeoffs among a number of important biological tasks (or objectives) in a Pareto optimal manner. A Pareto optimal situation is one where any improvement in operation of one objective will diminish the ability to achieve another objective . The delicate balance between critical biological objectives under different environmental scenarios is the primary bases for the variety of biological phenotypes we observed every day. A quantitative system-level understanding of metabolic tradeoffs would be invaluable for predicting outcomes that would result from natural or man-made changes to biological systems of various sizes and complexities. To gain this type of understanding requires genome-scale Multi-Objective Flux Analyses (MOFA) of systems.
Genome-scale models of metabolism that are solved using constraint-based methods such as Flux Balance Analysis (FBA)  are key tools that are used to examine the metabolic capabilities and overall robustness of biosystems. MOFA expands upon FBA (which optimizes a sole biological objective) and quantitatively maps the n (number of examined objectives)-dimensional Pareto front for the system of interest. To date the largest number of objectives that have been simultaneously examined during MOFA have not exceeded five. We have developed two computational codes for high-dimensional examination of biological trade-offs. The first code is a Matlab-based extension for the popular COBRA toolbox which allows MOFA for up to 10 objectives on a single CPU. The second is a high-performance computing based program that allows High dimensional (HD, n>>10) MOFA.
We have used MOFA to rank importance of different objectives in biological systems under different environmental conditions. One of our analyses involved examining photoheterotropic metabolism in the model organism Rhodopseudomonas palustris (RP). Nearly all FBA studies proceed based on the assumption that maximum production of biomass is the sole objective of biological systems. However, recently it was shown that a Pareto-optimal combination of growth, ATP production, and nutrient allocation controls the behavior of most microbes . While our MOFA of RP identified the same three objectives as the primary drivers of the cellular behavior, our finding show that carbon efficiency more so than growth influences the metabolic mode of the system. We also find that diversion of cellular resources away from these three objectives toward engineered objectives such as maximum production of biofuels drastically hinders cellular growth. Overall, our newly developed computational tools for conducting MOFA on complex biological systems drastically expands our ability to assess the capabilities of a system and improve our predictions about the outcomes of our engineering efforts. Our HD MOFA program is a critical tool for examination of interactions in complex biological systems that are composed of large number taxa, such as algal-bacterial interactions in biofuel producing systems or interactions between commensal microflora and pathogenic bacteria.
- PARETO, V. Manual of Political Economy, 1971.
- ORTH, J. D.; THIELE, I.; PALSSON, B. O. What is flux balance analysis? Nat Biotechnol, 28(3), 245, 2010.
- SCHUETZ, R. et al. Multidimensional optimality of microbial metabolism. Science, 336(6081), 601, 2012.
Quantitative methods for detecting origins of interferons signalling sensitivity
Karol Nienałtowski, Katarzyna Andryka, Karolina Zakrzewska, Tomasz Jetka, Michał Komorowski
Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, PL
Interferons (IFNs) signalling is a key mechanism to coordinate antiviral, anti-proliferative and immunomodulatory effects (1). A substantial amount of molecular details is known regarding IFNs signalling pathways, even though understanding, how information about complex mixture of IFNs is processed and translated into distinct cellular responses remains elusive (5,6). A good illustration is a sensitising effect of IFN type-I. Although the presence of this phenomenon is well known, its impact on signalling fidelity and biochemical mechanism that lead to these changes has not been recognised so far. Our experimental studies on IFNs signalling on mouse embryonic fibroblasts have shown that prior exposure to IFN type-I modify cellular response to IFN type-II stimulation. Precisely, information-theoretic analysis indicate higher sensitivity of pre-stimulated cells to the presence of IFN type-II in the intercellular environment. However, due to the complexity of signalling networks identification of origins of this mechanism cannot be addressed with solely experimental methods (7). Here we propose integration of high-throughput experimental single-cell measurements with a stochastic modelling in order to provide better understanding of mediation between IFNs type-I and -II signalling. Our solution is based on analysing intrinsic and extrinsic sources of heterogenous cellular response using unscented transformation and Sequential Monte Carlo methods (8,9). We have shown that origins of increasing sensitivity of pre-stimulated cells are changes in the initial cellular concentration of signal transducer and activator of transcription (STAT) proteins. Deciphering the mechanism that lead to more sensitive cellular response informs further research on novel therapeutic and diagnostic strategies to utilise the clinical potential of IFNs (5).
- R.J.Critchley-Thorne et al. (2009) Impaired interferon signalling is a common immune defect in human cancer, Proceedings of the National Academy of Sciences.National Acad Sciences, 106:9010–5.
- D.S.Aaronson (2002) A Road Map for Those Who Don't Know JAK-STAT, Science, 296:1653–5.
- I.M.Kerr et al. (1994) Jak-STAT pathways and transcriptional activation in response to IFNS and other extracellular, Science. American Association for the Advancement of Science, 264:1415–21.
- B.S.Parker et al. (2016) Antitumour actions of interferons: implications for cancer therapy, Nat Rev Cancer, 16(3):131-144..
- L.Zitvogel et al. (2015) Type I-interferons in anticancer immunity, Nature Rev Immunol, 7:405-414..
- B.N.Kholodenko (2006) Cell-signalling dynamics in time and space, Nat Rev Mol Cell Biol, 7:165–76.
- T.Toni, B.Tidor (2013) Combined Model of Intrinsic and Extrinsic Variability for Computational Network Design with Application to Synthetic Biology, PLoS Comput Biol, 9:e1002960.
- S.Filippi et al. (2016) Robustness of mek-erk dynamics and origins of cell-to-cell variability in mapk signaling, Cell reports, 15:2524-2535.
Pancancer modelling predicts laurethe context-specific impact of somatic mutations on transcriptional programs
Hatice Ulku Osmanbeyoglu, Eneda Toska, Carmen Chan, José Baselga, Christina S. Lesliea
Memorial Sloan Kettering Cancer Center, USA
Pancancer studies have identified many genes that are frequently somatically altered across multiple tumour types, suggesting that pathway-targeted therapies can be deployed across diverse cancers. However, the same ‘actionable mutation' impacts distinct context-specific gene regulatory programs and signalling networks—and interacts with different genetic backgrounds of co-occurring alterations—in different cancers. Here we apply a computational strategy for integrating parallel (phospho)proteomic and mRNA sequencing data across 12 TCGA tumour data sets to interpret the context-specific impact of somatic alterations in terms of functional signatures such as (phospho)protein and transcription factor (TF) activities. Our analysis predicts distinct dysregulated transcriptional regulators downstream of somatic alterations in different cancers, and we validate the context-specific differential activity of TFs associated to mutant PIK3CA in isogenic cancer cell line models. These results have implications for the pancancer use of targeted drugs and potentially for the design of combination therapies.
A Genome-Scale Metabolic Reconstruction of Eubacterium limosum KIST612
Gebze Technical University, Department of Bioengineering, 41400 Kocaeli, TK
Eubacterium limosum is an acetogenic bacterium with growing importance both in medical sciences and biofuel research. E.limosum was isolated from anaerobic digester and is part of the gut microbiota which plays an important role on human health. From the bioengineering perspective, E.limosum can utilize Synthetic Gas (SynGas : H2, CO2 and CO) and therefore can produce biomass, acetate, ethanol and other biomolecules of commercial value by utilizing cheap carbon sources. Characterization and optimization of its growth under variety of growth conditions is of interest both for better understanding of gut microbiota and for bioengineering applications.
Genome-scale metabolic reconstructions (GENREs) of organisms are often used for systematic analysis of their metabolic activities. However, manual generation of such models is time consuming, therefore algorithms have been developed to automise the generation procedure. Genome sequence of the organism is used to identify the enzymes encoded by the organism, usually via homology to better studied organisms’ genome, then, reactions potentially catalysed by these enzymes are listed, finally, the resulting metabolic network is the basis of the reconstruction. One such reconstruction of E.limosum KIST612 strain was generated by ModelSEED/KBase/rBioNet algorithms and validated for gut microbiota conditions(1). However, the model lacks the reactions that describe utilization of CO, a major carbon source of E.limosum. Therefore, the existing model fails to represent the metabolism under some of the growth conditions of interest.
To address the limited capabilities of the semi-generated metabolic network of E.limosum, I undertook the task of generating a model which accurately represents the metabolism under a variety of synthetic growth conditions. Using the workflow in Figure 1, the network was corrected, refined and validated based on the genome databases and growth characteristics of E.limosum in pure or mixed carbon sources such as CO/acetate/methanol and SynGas.
- Magnúsdóttir, S. et al., 2017, Nature Biotechnology 35, 81–89
Predictive Virtual Infection Modeling of Candida albicans Immune Escape in Human Blood
Maria T. E. Prauße, T. Lehnert, S. Timme1,, K. Hünniger, O. Kurzai, M.T. Figge
Applied Systems Biology - Hans-Knöll-Institute, Jena, DE; Friedrich-Schiller-Universität Jena, DE
Fundamental research in immunology is a well occupied scientific field, which nowadays connects both clinical findings and theoretical approaches. Immune reactions are such complex processes, that despite common effort many problems remain elusive for a long time. An example is the mechanism whereby fungal cells escape from phagocytosis and killing by immune cells which was detected in previously published whole-blood infection assays . Unfortunately, such processes are difficult to investigate experimentally, because whole-blood is a complex system. Nevertheless, such problems need to be addressed, since fungal infections highly contribute to the increasing number of sepsis cases each year. A spontaneous immune escape mechanism from phagocytosis and killing was implemented in the non-spatial state-based model (SBM) of whole-blood infections, originally created by Hünniger and Lehnert , which was then further developed by Lehnert and Timme . We here study an alteration of the immune escape mechanism, i.e. from a spontaneous mechanism to a type of process that is directly dependent on polymorphonuclear neutrophils (PMN). This is motivated by the fact that PMN are highly abundant in blood and are able to release effector molecules when triggered by fungal pathogens . Since we found that both types of immune escape mechanisms fit the data from the whole-blood infection assays, neither of them could be rejected. Therefore, we investigated the model behavior under neutropenic conditions, i.e. for reduced counts of functional PMN in human blood. We were able to predict the possible effect of two immune escape mechanisms in a whole-blood model infected with C. albicans cells. The SBM suggests that significant differences in the simulation results are due to the varied contribution by PMN under neutropenic conditions. In the future, the verification of this prediction may be investigated in laboratory experiments.
- Hünniger K, Lehnert T, Bieber K, Martin R, Figge MT, Kurzai O (2014) A virtual infection model quantifies innate effector mechanisms and Candida albicans immune escape in human blood. PLoS Comput Biol 10(2), e1003479.
- Lehnert T, Timme S, Pollmächer J, Hünniger K, Kurzai O, Figge MT (2015) Bottom-up modeling approach for the quantitative estimation of parameters in pathogen-host interactions. Frontiers in Microbiology 6(608).
Stochastic Modeling of Gene Regulatory Networks in Escherichia coli
Rodrigo Santibáñez, Daniel Garrido, Tomás Pérez-Acle, Alberto J.M. Martin
Fundación Ciencia & Vida, Santiago, CL; Universidad Católica de Chile, Santiago, CL; Universidad de Valparaíso, CL
Synthetic Biology has the ultimate objective of design cells with predictable responses. Our ability to develop modified and synthetic organisms tailored to chemical production is fostered by our ability to recombine DNA with error-free protocols. However, our current capacities for modeling how cells work is way behind our synthesis and analysis tools that difficult the prediction of desired cell responses. Interestingly, computational modeling has impacted prominently Synthetic Biology, where the manipulation of biological systems is cost intensive and computational resources could leverage experimental procedures. Traditionally, Ordinary Differential Equations (ODEs) have been employed to model biological systems, but their assumptions are simply not realistic. Particularly, it has been known for a long time that natural processes are stochastic, discrete and structurally complex, hampering differential equations systems to fit these properties. Even if noise is considered, modelers would be making assumptions on how cell components traveling between compartments could affect physically separated processes, how they bind each other, and how they perform behaviors that resemble cooperation and competition.
To further resolve a connection between modeling and designing organisms, we present a Rule-based model simulated using Gillespie’s Stochastic Simulation Algorithm. Under this approach, rules are macroscopic chemical reactions between entities that recapitulate one or several patterns necessary for a transformation. The rate associated with each rule represents how often a reaction fires in a given time. Noteworthy, our laboratory has developed a software called PISKaS that enable explicit compartmentalized modeling in Kappa language. We modeled two gene regulatory networks of E. coli. These two models resemble the core network that regulates transcription and the replication of the ColEI plasmid. Average and variance of selected variables were analyzed in these examples simulated employing arbitrary rates, yet surprisingly, their properties are in close agreement with experimental data. Specifically, when the core transcription network reached pseudo-equilibrium, it predicts free RNA Polymerase Holoenzyme close to 20%, relatively near the 30% reported during exponentially growing E. coli. Similarly, the plasmid replication controlled with a negative feedback simulated a saturation dynamics, producing tens or hundreds of copies, depending strongly on the rate of interaction between its non-coding RNAs.
We are aware of limitations in our example models. We considered cells in a pseudo-stationary state, therefore disregarding the necessity to model metabolism, translation and protein degradation or dilution. Although, the processes mentioned above could be incorporated efficiently in successive refinements. Importantly, modeling metabolism and linking it to transcription and translation would facilitate a more reliable prediction of phenotype emergency. To this end, a Gene Regulatory Network (GRN), a Genome-Scale Metabolic Model (GSMM) and (optionally) a protein-protein and RNA-protein interaction network will serve as inputs to write draft models. We sought to automatically write a genome-scale model of replication and gene expression joint to metabolism. For instance, we simulate a combined metabolism and gene expression model that resemble the published central metabolism of E. coli (MODEL1505110000) employing the RegulonDB GRNs and the iJO1366 GSMM, resulting in comparable dynamics as the ODE model.
Adaptative response of fission yeast metabolism to natural genetic variation
Maria Sorokina, Andreas Beyer
CECAD, University of Cologne, DE
Eukaryotic metabolic networks exhibit a high degree of redundancy at the level of individual enzymes and entire pathways. This redundancy increases the regulatory potential and it confers robustness to external and internal perturbations. Enzyme-level redundancy is at least in part mediated through enzyme promiscuity, i.e. an enzyme’s ability to catalyze multiple reactions on multiple compounds. Although these phenomena are well described for specific examples, we lack systematic understanding of metabolic plasticity and of its adaptation to genetic and environmental perturbations. In this presentation, I will introduce how the combination of metabolic networks, chemoinformatics methods and statistical genetics reveals how metabolic redundancy ‘buffers’ genetic variability under diverse conditions.
We developed new computational approaches to investigate the impact of natural genetic variation on single-enzyme and network-level redundancy. This new framework uses reaction molecular signatures (RMS- Carbonell et al. 2012), which we previously established as a tool to formalize enzyme promiscuity in a network context (Sorokina et al. 2015). RMS were used to group reactions and enzymes based on the chemical modifications that they execute on their substrates. This approach is particularly powerful for detecting and formally describing enzyme promiscuity and (partial) redundancy of metabolic pathways. In order to understand how metabolic redundancy mediates resistance to inter-individual genetic variability we used quantitative trait locus (QTL) analysis. This method was applied to multi-layer omics data (transcriptome and proteome) from fission yeast (Schizosaccharomyces pombe). By linking the RMS-based framework with QTL analysis we identified genetic variants segregating in this fission yeast cross that trigger specific adaptive changes in the metabolic networks.
We found that enzymes that are able to catalyze the same RMS tend to mutually ‘back-up’ each other in response to oxidative stress. Alleles at the QTL locus associated with these enzymes affected whether or not the backup functionality was properly utilized under oxidative stress. Reduced capacity for backup was in turn correlated with reduced cellular fitness, measured with yeast cell growth in liquid medium. Thus, these results suggest that metabolic redundancy is important for response to stress and that this redundancy might be specifically affected by genomic variability.
This work sets the foundation for better understanding the requirements and limitations of metabolic networks to cope with natural genetic variability. Tools and models we created will be further used to better understand inter-individual variability in disease susceptibility and drug response.
Classifying oncogenes and tumor suppressors in fusion protein-protein interaction networks using a community-based Naïve Bayes approach
Somnath Tagore, Milana Frenkel-Morgenstern
BAR-ILAN University, IL
Since the first cancer genome sequencing in 2008, various large-scale studies have been conducted for analyzing multiple tumor types. These analyses also generated catalogs of tumor-specific mutation and identified particular genes that act either as oncogenes or tumor suppressors. Nevertheless, the general approach for characterization of cancer genes as onco-genes or tumor suppressers are entirely unavailable. Our approach is based on the fusion breakpoints from our previously developed ChiTaRS-3.1 database (1), and the protein-protein interaction networks of our publicly available ChiPPI server (1) for the onco-genes and tumor suppressors' classification, using a Naïve Bayes approach. In this study, we consider as a training set of the protein-protein interaction networks of 150 fusions and 300 their parental proteins in leukemia, lymphomas, sarcomas and solid tumors. We performed the modularization of the protein-protein interactions to identify communities using a network-based parameter called ‘community attachment score’, composed of degree, network diameter and average degree of the community. It signifies whether a protein attaches with an existent community creating a new link. We hypothesize that proteins which appear in the same community are likely to have their similar molecular functions. If any of the already found proteins does not belong to either of these categories or their information is unknown, we categorize them as unknown proteins. For identifying the community attachment score, we also calculate well-known network parameters from the protein-protein interaction data, namely, interaction-score (using our ChiPPI method), the network degree, clustering coefficient, and betweenness centrality, respectively. We found that the number of communities are over-represented in leukemia, lymphomas (70%) in comparison to those found in sarcomas (30%) and solid tumors (10%), due to the presence of more connected clusters. Further, in sarcomas, due to less compact communities, the number of open links are over-represented, resulting in more interacting proteins, that tend to attach to these communities. The situation in solid tumors is more interesting: the number of communities are significantly reduced, giving rise to ubiquitous proteins. We use the Receiver Operator Characteristic (ROC curve) to test the performance of our method and obtained the best results of 0.9 and 0.89 for oncogenes and tumor suppressors, respectively. Known oncogenes including TRAF7 and ALK that are missed by previously published tools at the p-value<0.05 cutoff, are easily detected by our method. The Naïve Bayes classifier was found to have an accuracy of 90.6%. Taken these results together, we concluded that our method may be further useful for the proteins' classification of proteins with unknown functions, not only for oncogenes and tumor suppressors, but also for mutations in the specific cancer sub-types. References:
1. Gorohovski, A., et. al. (2017) ChiTaRS-3.1-the enhanced chimeric transcripts and RNA-seq database matched with protein-protein interactions. Nucleic Acids Res. 45(D1): D790-D795.
Logic modeling in quantitative systems pharmacology
Pauline Traynard , Luis Tobalina, Federica Eduati, Laurence Calzone, Julio Saez-Rodriguez
Institut Curie, Paris, France; RWTH Aachen University, Aachen, DE; EMBL-EBI, Hinxton, UK
The structure and functioning of signaling networks is complex, and they are differently deregulated in different contexts in non-trivial ways. To ensure efficiency of the drug treatments, a good knowledge of these complex interactions and how patient mutations affect the cellular fate is necessary. Among modeling techniques, logic modeling has proven to be very versatile and able to provide useful biological insights. Here, we show how to build a logic model from literature and experimental data and how to analyze the resulting model to obtain insights of relevance for systems pharmacology, using a prostate cancer example that involves some of the key phosphorylation pathways of this malignancy. We use data describing the phosphorylation response of key proteins in prostate cancer cell lines in response to the addition of several ligands and inhibitors (Lescarbeau & Kaplan 2014). The steps include the building of the signaling network, its improvement using available data and its simulation and analysis geared towards obtaining useful biological insights and predictions. Our workflow uses the free tools Omnipath (Türei et al. 2016) for building the signaling network from literature, CellNOpt (Terfve et al. 2012) to fit the model to experimental data, MaBoSS (Stoll et al. 2012) to simulate and predict treatment response, and Cytoscape (Shannon et al. 2003) for visualization. Logic modeling, as implemented in this pipeline, can be a useful approach to understand deregulation of signal transduction in disease and to characterize drug’s mode of action.
A Quantitative Model for the Rate-Limiting Process of UGA Alternative Assignments to Stop and Selenocysteine Codons
Yen-Fu Chen, Hsiu-Chuan Lin, Kai-Neng Chuang, Chih-Hsu Lin, Chen-Hsiang Yeang,
Academia Sinica, CN
Ambiguity in genetic codes exists in cases where certain stop codons are alternatively used to encode non-canonical amino acids. In selenoprotein transcripts, the UGA codon may represent either a translation termination signal or a selenocysteine (Sec) codon, and results in the expression of full-length (PL) and truncated (PS) selenoproteins respectively. Translating UGA to Sec requires selenium and specialized Sec incorporation machinery such as the charged Sec-specific tRNAs (Sec-tRNASec) and the interaction between the SECIS element and SBP2 protein. In contrast, translation termination takes place when UGA is bound by a release factor and triggers the release of the translated protein from the ribosome. How these factors quantitatively affect alternative assignments of UGA has not been fully investigated. We developed a model simulating the UGA decoding process. Our model is based on the following assumptions: (1)Synthesis and degradation reactions of both PL and PS follow first-order kinetics, (2)PS possesses a considerably shorter half-life than and PL, (3)Sec-tRNASec and release factors compete for a UGA site, (4)The total amount of selenoprotein mRNA is distributed among the transcripts participating in the translation of PL, PS and free molecules, (5)Sec-tRNASec abundance is limited by the concentration of selenium and Sec-specific tRNA (tRNASec) precursors. We demonstrated that this model captured two prominent characteristics observed from experimental data. First, UGA to Sec decoding increases with elevated selenium availability, but saturates under high selenium supply. Second, the efficiency of Sec incorporation is reduced with increasing selenoprotein synthesis, and increases with elevated selenium concentration but saturates under high selenium supply. Figure 1 displays predicted Sec incorporation efficiencies under four different constraints with various mRNA quantities and selenium concentrations. Models with the mRNA constraint (assumption 4) exhibits saturated protein abundance with increased selenium supply. Models with the tRNA constraint (assumption 5) shows increased Sec incorporation efficiency with reduced selenoprotein synthesis and saturates with abundant selenium supply. Thus both constraints are required to capture the observed characteristics. We further developed an algorithm to estimate model parameters to fit the experimental data of selenoprotein synthesis and abundance. Multiple selenoproteins exist in animal proteomes, and their difference in Sec incorporation efficiency leads to a “selenoprotein hierarchy” under selenium deficiency: proteins with higher Sec incorporation efficiency are more rapidly synthesized. It is well known that hierarchical selenoprotein expression depends on the SECIS-SBP2 interaction, but whether this interaction is the sole determinant for selenoprotein hierarchy remains unclear. We measured the expressions of four selenoprotein constructs and estimated their model parameters. Their inferred Sec incorporation efficiencies did not correlate well with their SECIS-SBP2 binding affinities, suggesting the existence of additional factors determining the hierarchy of selenoprotein synthesis under selenium deficiency. This model provides a framework to systematically study the interplay of factors affecting the dual definitions of a genetic codon.
The challenge of integrating multi-omic multi-factorial data to infer regulatory networks
Sonia Tarazona, Mónica Clemente-Císcar, Rafael Hernández-de-Diego, David Gómez-Cabrero, Pedro Furió-Tarí, Carlos Martínez-Mira, Ana Conesa
Centro de Investigación Príncipe Felipe, Valencia, ES; Universitat Politècnica de València, ES; University of Florida, Gainesville, USA; Karolinska Institutet, Stockholm, SE; Karolinska University Hospital, Stockholm, Sweden; Science for Life Laboratory, Solna, Sweden; King’s College London Dental Institute, London, UK
If the integration of multi-omic data to model the regulation of a biological system is per se challenging, it is even more complicated for multi-factorial designs where omic data under different experimental conditions, time series, etc. are to be generated. A careful planning of every stage of this process together with appropriate statistical tools to perform the integration are key to make the most of the data so we can model the regulatory mechanisms of the system.
After the STATegra European project experience, we can offer interesting insights on these topics. We propose a road map for integrative analysis that covers the different aspects of integrative studies: the design of the experiment, the data pre-processing and variable selection, the omic features matching, the visualization, the integration methodologies themselves, and the validation of the results. We discuss the different issues that must be addressed from a general point of view, but also present novel tools that may help researchers to overcome them. Some of these tools are:
STATegra Experiment Management Sytem (EMS). In a multi-omic (multi-factorial) experiment, many biological samples are manipulated and complex pipelines are followed to obtain the final processed data producing different outcomes. Moreover, several labs or researchers are usually involved. STATegra EMS software was implemented to organize all this information: annotate the generated samples, describe the protocols applied at each step, and indicate the location of the outcome, so everyone in the project can easily get information about each of the different files.
MultiPower. Another important issue regarding the experimental design is to decide the number of replicates to generate per omic and experimental condition. MultiPower method estimates the optimum number of replicates to be obtained so that a minimum power for the statistical methods to detect changes is assured while the cost of the experiment is minimized.
RGmatch. For some integration strategies, it is required that omic features are matched prior to the statistical analysis: microRNAs or transcription factors with their target genes, chromatin accessible regions to their closes genes, etc. RGmatch algorithm was designed to associate genomic regions to the closest genes, and also returns the area of the gene where the region overlaps (TSS, exon, intron, promoter, etc.), the distance to the gene and other useful information.
Paintomics. Multi-omics data visualization on biological pathways allows for understanding the pathway behavior or regulation. Paintomics webtool provides a useful and clear graphical representation of multi-omic multi-factorial data on KEGG pathways as well as other analyses such as pathway enrichment or pathway interaction.
MORE. The MultiOmics Regulation (MORE) method computes the statistical evidence for potential gene-regulator associations to be happening in the present multi-omic experiment. It is based on generalized linear models and automatically generates the model equation for each gene including the potential regulators. Variable selection procedures were implemented at different levels of the MORE analysis to ensure that the model power is enough to obtain the significant regulations. It also includes functionalities to manage and visualize the results.
MOSim. The MultiOmics Simulator generates multi-factorial quantitative data for a number of omics that can be used to assess the performance of any integrative method.
We illustrate the use of these tools on the STATegra collection of omics time series data from a B-cell differentiation process in mouse under control and Ikaros-induction conditions: RNA-seq, microRNA-seq, RRBS-seq and DNase-seq. The integrative analysis revealed that RNA-seq was the most robust technique. The MORE models for the differentially expressed genes allowed for the reconstruction of a global regulatory network showing that transcriptional regulation was more prevalent than post-transcriptional regulation.
Posters selected for poster submissions
BTR: training asynchronous Boolean models using single-cell expression data
Chee Yee Lim, Huange Wang, Steven Woodhouse, Nir Piterman, Lorenz Wernisch, Jasmin Fisher and Berthold Gottgens
University of Cambridge, UK
Rapid technological innovation for the generation of single-cell genomics data presents new challenges and opportunities for bioinformatics analysis. One such area lies in the development of new ways to train gene regulatory networks. The use of single-cell expression profiling technique allows the profiling of the expression states of hundreds of cells, but these expression states are typically noisier due to the presence of technical artefacts such as drop-outs. While many algorithms exist to infer a gene regulatory network, very few of them are able to harness the extra expression states present in single-cell expression data without getting adversely affected by the substantial technical noise present.
Here we introduce BTR, an algorithm for training asynchronous Boolean models with single-cell expression data using a novel Boolean state space scoring function. BTR is capable of refining existing Boolean models and reconstructing new Boolean models by improving the match between model prediction and expression data. We demonstrate that the Boolean scoring function performed favourably against the BIC scoring function for Bayesian networks. In addition, we show that BTR outperforms many other network inference algorithms in both bulk and single-cell synthetic expression data. Lastly, we introduce two case studies, in which we use BTR to improve published Boolean models in order to generate potentially new biological insights.
BTR provides a novel way to refine or reconstruct Boolean models using single-cell expression data. Boolean model is particularly useful for network reconstruction using single-cell data because it is more robust to the effect of drop-outs. In addition, BTR does not assume any relationship in the expression states among cells, it is useful for reconstructing a gene regulatory network with as few assumptions as possible. Given the simplicity of Boolean models and the rapid adoption of single-cell genomics by biologists, BTR has the potential to make an impact across many fields of biomedical research.
Conceptual and computational framework for logical modelling of biological networks deregulated in diseases
Arnau Montagud, Pauline Traynard, Loredana Martignetti, Eric Bonnet, Emmanuel Barillot, Andrei Zinovyev and Laurence Calzone
Institut Curie, Paris, FR; CEA, FR
Mathematical models can serve as a tool to formalize biological knowledge from diverse sources, to investigate biological questions in a formal way, to test experimental hypotheses, to predict the effect of perturbations and identify underlying mechanisms. Using a model of initiation of the metastatic process as a transversal example, we present a pipeline of computational tools that performs a series of analyses to explore a logical model’s properties, notably its ability to predict biological outcomes. We start by analysing the structure of the network of interactions. Next, we explain how to translate this network into a mathematical object, specifically a logical model, and how robustness analyses can be applied to it. We explore the visualization of stable states as a tool to better understand the biology behind cell fate decisions. With the different tools we present here, we propose to explain how to assign to each solution of the model some probability and how to identify genetic interactions using the mutant phenotype probabilities. Finally, the model is connected to relevant experimental data: we present how the data analyses can direct the construction of the network, and how the solutions of a mathematical model can also be compared with experimental data, with a particular focus on high-throughput data in cancer biology.
A step by step tutorial is provided as a supplementary material and all models, tools and scripts are provided on an accompanying website: https://github.com/sysbio-curie/Logical_modelling_pipeline.
A method for time dependent pathway networks
Shinuk Kim and Hyunsik Kang
Sangmyung University, KR; Sungkyunkwan University, KR
In this work, we demonstrate mathematical methods for inferring dynamic pathway interactions by converting static datasets into dynamic datasets using cancer patients’ clinical information. One approach is using survival time–based dynamic datasets, and the other is using grade- and stage-based dynamic datasets. Based on cancer grades and stages, we generated 6 dynamic levels and obtained two pairs of significant pathways out of 12 enriched pathways via GSEA. One pair of the pathways included LEISHMANIA INFECTION and ALLOGRAFT REJECTION (correlation coefficient = 0.89), and HLA-DMB, HLA-DOA, HLA-DOB and IFNG were identified as common genes in the pathways. The other pair included SPLICEOSOME and PRIMARY IMMUNODEFICIENCY (correlation coefficient = 0.94), and no common genes were identified in the pathways.
This work has been supported by the basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01060287)and Samsung research fund of Sungkyunkwan University 2016
Representations of Markov processes in biological optimization problems
Małgorzata Wnętrzak, Paweł Błażej, Małgorzata Grabińska and Paweł Mackiewicz
Faculty of Biotechnology, University of Wrocław, PL
Continuous-time, homogeneous and stationary Markov processes are commonly used to describe biological phenomena, such as biochemical reactions or DNA mutations. These phenomena are often treated as optimization problems whose potential solutions constitute a search space. The proper representation of the search space is essential for obtaining solutions of the best quality. It is especially important when the potential solutions are complex or unusual. Hence, we described three representations of the search spaces consisting of Markov processes. The structures of potential solutions are based on the substitution-rate matrices models: (1) the generalized time-reversible model (GTR) with a fixed stationary distribution and a similar speed of convergence to the stationarity; (2) the GTR model without the latter assumption, and (3) the general unrestricted (UNREST) model. Moreover, we proposed several evolutionary operators which could be used in solving optimization problems by means of an evolutionary-based algorithm. We also gave a formula for the fitness function to measure the quality of potential solutions in the problem of optimality of mutational pressure on coding sequences. The results show that mutational pressures are optimized, to a certain degree, with respect to minimizing the costs of amino acid substitutions in proteins. The proposed representations and operators can be used in problems that are described by transition probability matrices.
Inferring network statistics from high-dimensional time-course data
Dominik Linzner, Heinz Koeppl
TU Darmstadt, DE
Reconstructing networks from current high-dimensional datasets is a notoriously ill-posed problem. For this reason we want to focus on more general statistical properties, as finding the degree distribution of interacting neighbors inside complex networks spanned for example by genes or proteins. Surprisingly, while plenty of work has been dedicated to this direction in the static case, dynamical approaches are still in their infancy.
A recent method from statistical physics previously applied to the dynamics of spin glasses, the so-called extended Plefka expansion, allows for high order mean-field approximations of dynamical interacting models to non-interacting ones . We study how well this method describes the dynamics of large random but fully observed networks drawn from different distributions. Moving to the more realistic scenario of undersampling, meaning that only parts of the network have been observed, the Plefka expansion provides a natural framework. Based on this, we propose a new approach to analyze individual node trajectories, e.g. individual protein concentrations, to infer statistical properties of the latent network structure, such as the sparsity of connections. In detail, we observe that local dynamics are influenced by the colored noise, which is determined by these statistical properties. Our dynamical model can handle high-dimensional time-course data in which only a few entities haven been adequately observed. We test our method in this realistic scenario on synthetic and experimental MS data from perturbation experiments of two prostate cancer cell lines.
1. Bravi, B. et al (2016). Journal of Physics A: Mathematical and Theoretical, 49(19), 194003.
A systems biology approach to investigate cell fate switches in intestinal organoids
Rik Lindeboom, Lisa van Voorthuijsen and Michiel Vermeulen
Radboud Institute for Molecular Life Sciences, NL
Intestinal organoid cultures recently emerged as an ideal in vitro model system to study adult stem cell maintenance and differentiation in a controlled manner. We are investigating the molecular mechanisms that drive cell fate changes in mouse intestinal organoids by generating stem cell enriched and stem cell depleted organoid cultures using small-molecule driven perturbations. Analyses of the transcriptome and the proteome of these organoids revealed that, besides the expected dynamics of intestinal stem cell and differentiation markers, hundreds of additional genes are differentially expressed during adult intestinal stem cell differentiation. Strikingly, we observed post-transcriptionally regulated transcription factor module switching in stem cell enriched versus stem cell depleted organoid cultures. Furthermore, probing the epigenetic landscapes of intestinal (stem) cells using ChIP-sequencing and ATAC-sequencing revealed a large number of cell-type specific regulatory elements. Finally, by using an integrative systems biology approach, we aim to uncover all layers of gene expression regulation in perturbed organoid cultures. These layers range from transcription factor binding to feedback from the metabolome and uncover the regulatory networks that define the remarkable cellular plasticity of the mouse intestinal epithelium.
Hybrid multivariate modelling of drug response in human cancer cell lines
Joint Research Center for Computational Biomedicine, Aachen, DE
Although long among the leading causes for mortality, cancer remains a challenging and only partially understood group of diseases with a virtually unparalleled heterogeneity and complexity that impede the identification of causes and regulatory factors contributing to disease progression and clinical outcome. In recent years, several research teams have established large-scale data sets of human cancer cell lines spanning increasing numbers of different tissues of origin and tumor types in order to advance the study and understanding of the drivers of cellular responsiveness to anti-cancer drugs. We propose an approach that aims to stratify a diverse panel of human cancer cell lines based on a set of multivariate, heterogeneous confounders like the mutational status of key genes, tissue and cancer characteristics. Cells within these defined subgroups display an increased homogeneity that facilitates the modeling of cell sensitivity to anti-cancer drugs via genomic features computed from basal gene expression data. The nonlinear models trained and tested on these stratified cell groups employ both patterns of genome-wide expression and the expression values of single genes found to be associated with response profiles. Additionally, they incorporate interaction terms for both aforementioned features.
Structural equation modeling with latent variables of genomic information for multiple diseases
Saebom Jeon, Ji-Yeon Shin, Jaeyong Yee, Taesung Park, Mira Park
Mokwon University, KR; Eulji University, KR; Seoul National University, KR
For complex disease such as autism disorder, diabetes and hypertension, genome-wide association studies have been successful in finding the genetic determinants. However, association analyses between genotypes and phenotypes are not straightforward due to the complex relationships between genes or environmental factors. Moreover, in the presence of multiple correlated phenotypes the analysis becomes more complicated. In this study, we consider the structural equation modelling approach to resolving this complexity. The structural equation model (SEM) is a multivariate statistical model that is commonly used to analyze the complex structural relationships between observed variables and latent constructs. We propose a new SEM-based modelling approach to the association study between genetic variants such as single nucleotide polymorphisms (SNPs) and diseases with multiple phenotypes. Different from the current available methods that focus only on finding the direct association between SNPs and diseases, our SEM-based method introduces intermediate variables to examine both direct and indirect relationships. Our SEM-based model consists of four steps: (i) selecting informative SNPs by single SNP analysis for each phenotypes, (ii) constructing latent variables by factor analysis of the selected SNPs, (iii) applying SEMs to find the relationships between SNPs, latent variables, intermediate variables, and diseases, (iv) finding best models using goodness-of-fit measures. To examine the validity of the SEM-based approach, we analyzed the Korean GWAS data collected from Korea Association Resource (KARE) project. We considered two types of diseases (hypertension, diabetes) and three intermediate variables related to obesity (subscapular skinfold, body mass index, and waist circumference). Our SEM-based analysis successfully constructed the path models with paths starting from SNPs to the diseases through intermediate variables. Such constructed path models were shown to explain the relationships among the SNPs, latent variables and phenotypes simultaneously.
The problem of recombination suppression in evolution of sex chromosomes
Dorota Mackiewicz, Piotr Posacki, Michał Burdukiewicz, Paweł Błażej, Małgorzata Grabińska
University of Wroclaw, Faculty of Biotechnology, PL
Motivation: In the genetic system of sex determination, mammalian females possess two copies of X chromosome, whereas males have one X and one Y chromosome. A spectacular phenomenon related to the sex chromosome evolution is the shrinkage of the Y chromosome. It was proposed that this process is related with the cessation of recombination between the Y and its counterpart, i.e. the X. The suppression of recombination occurred by the series of large-scale inversions happened most likely on the Y chromosome. Then selection favoured successive mutations and stepwise extension of the genetic linkage because it increased the probability of joint transmission of genes beneficial for one sex. The role of unfaithfulness of mating pairs was also suggested in the Y degradation.
Results: To study all these aspects, we applied a more general and advanced computer simulation model in which the recombination rate between the sex chromosomes can freely evolve and individuals can create unfaithful or faithful pairs. We found that only under the unfaithfulness of mates the number of females increases at the expense of males in the evolving population and the accumulation of mutations on the Y chromosome occurs. Therefore, the recombination rate between the X and Y decreases very quickly and the Y degenerates. Thus, the X chromosomes are cleaned off defective alleles and the reproduction potential of population, measured by the number of females, is not reduced. The simulation showed that the suppression of recombination is spontaneous and does not require inversions.
Fast biological network reconstruction from high-dimensional time-course perturbation data using sparse multivariate Gaussian processes
Sara Al-Sayed, Heinz Koeppl
TU Darmstadt, DE
Time-course data observed under the perturbation of biological systems contain rich information about the salient structure of interconnectivity among the entities of the network underlying the system. Network reconstruction approaches that exploit the temporal dependency of the high-dimensional data points tend to outperform those hinging on the assumption of temporal independence. In this work, we model the discrete, noisy time-course observational data as temporal snapshots of realizations of a multivariate Gaussian process. This amounts to specifying a Gaussian process prior over the temporal trajectories describing the evolution of the entities in the biological network. Special choices of the covariance kernel of the multivariate Gaussian process give rise to processes that can be described by a system of coupled stochastic linear differential equations where the sparse network structure and coupling weights are unknown. Exploiting the state-space representation of this system, computationally efficient Kalman filtering techniques can be used to score sparse candidate network structures given the observed data, in terms of the a posteriori candidate probabilities. The calculation of these scores involves data likelihood marginalization, for which we propose an efficient computational method. Prior network knowledge derived from databases, as well as modeling uncertainties, can be easily integrated into the approach, translating it into a fully Bayesian framework for structure learning. The approach is illustrated on both synthetic and real data. The latter is mass-spectrometry proteomic time-course perturbation data of prostate cancer cell lines, and the results are verified against biological literature databases.
Mathematical modelling of promoter occupancies in MYC-dependent gene regulation
Uwe Benary, Elmar Wolf and Jana Wolf
Max Delbrück Center for Molecular Medicine in the Helmholtz Association, DE; University of Würzburg, DE
The human MYC proto-oncogene protein (MYC) is a transcription factor that plays a major role in the regulation of cell proliferation. Deregulation of MYC expression is often found in cancer. In the last years, several hypotheses have been proposed to explain cell type specific MYC target gene expression patterns despite genome wide DNA binding of MYC. Our mathematical modelling approach in combination with experimental data demonstrates that differences in MYC-DNA-binding affinity are sufficient to explain distinct promoter occupancies and allow stratification of distinct MYC-regulated biological processes at different MYC concentrations. The comprehensive analysis of our model shows that the insights, which were gained in the investigation of the human osteosarcoma cell line U2OS, can be generalized to other human cell types.
E-cyanobacterium.org: A Web-based Platform for Systems Biology of Cyanobacteria
Matej Troják, David Šafránek, Jakub Hrabec, Jakub Šalagovič, Františka Romanovská, Matej Hajnal and Jan Červený
Masaryk University, SK, CZ; Global Change Research Institute CAS, CZ
The understanding of a complex cellular machinery is a crucial problem in current systems biology, especially for photosynthetic organisms such as cyanobacteria. To challenge this uneasy task we have developed an online platform which consists of three interconnected modules.
- Biochemical Space (BCS) formally express (bio)chemical reactions facilitated by cyanobacteria molecular entities. Such reactions are represented in a generalised form of rules specified in the Biochemical Space Language - a novel rule-based language. The rules form a hierarchy of (bio)chemical processes covering transport, metabolism, circadian clock, photosynthesis, and carbon concentrating mechanism.
- Model Repository is a collection of related mathematical models accompanied with simulation and static analysis algorithms. Linkage of model components to BCS determines their exact biological meaning. This feature is also present in models exported to SBML standard format.
- Experiments Repository serves to import, store, and plot time-series experiments. The measured variables are connected to BCS in the same manner as in the case of model components. Individual experiments possess additional references to related models.
In addition, data annotation is available in all modules of the platform. In consequence, BCS, models, and experiments are well-noted and referenceable. Moreover, the usability of the platform is enhanced with visualisations provided for process hierarchy and reaction networks in BCS, simulations in Model Repository, and time-series plots in Experiment Repository.
In conclusion, our platform provides a unique solution based on integrating three different approaches to stimulate collaboration between experimental and computational systems biologists.
PITHYA: High-Performance Parameter Synthesis for Biological Models
Nikola Beneš, Luboš Brim, Matej Hajnal, Martin Demko, Samuel Pastva and David Šafránek
Masaryk University. SK, CZ
Biological systems exhibit complex behaviour emerging from non-linear interactions among system components. A system can be specified in terms of an ODE(ordinary differential equations) model typically containing parameters which can significantly affect system behaviour. In general, it is difficult to obtain exact parameters values from experimental data.
The number of parameters and their interdependence make the identification of parameters values a hard task. A common approach is to use parameter estimation from time-series data. Such data might be of low resolution or even unavailable. Instead of estimating parameters from data, an alternative approach is to specify global hypotheses on system behaviour in terms of temporal properties and to use parameter synthesis methods based on model checking, a verification technique proven by decades of use in computer science. We present a new high-performance tool Pithya, that implements state-of-the-art parameter synthesis methods. For a given ODE model, it allows to visually explore model behaviour with respect to different parameter values. Moreover, Pithya automatically synthesises parameter values satisfying a given property. Such property can specify various behaviour constraints, e.g., maximal reachable concentration, time ordering of events, characteristics of steady states, presence of limit cycles, etc.
Transcriptomics Driven Lipidomics (TDL) identifies the microbiome-regulated targets of ileal lipid metabolism
Anirikh Chakrabarti, Mathieu Membrez, Delphine Morin-Rivron, Jay Siddharth, Chieh Jason Chou, Hugues Henry, Stephen Bruce, Sylviane Metairon, Frederic Raymond, Bertrand Betrisey, Carole Loyer, Scott Parkinson, Mojgan Masoodi
Nestle Institute of Health Sciences, NESTLE, CH; Service de biomédecine (BIO), UNIL-CHUV, CH
Gut microbiome is an essential factor regulating lipid metabolism in metabolic health. Mechanisms and impacts of host/microbe interactions are complex, integrative and multifactorial requiring extensive investigations. Dissecting these molecular mechanisms would benefit from a comprehensive predictive strategy to develop testable hypothesis for studying impacts on lipid metabolism. We introduce Transcriptomics Driven Lipidomics (TDL), wherein, coupling transcriptomics with legacy knowledge, lipid and non-lipid databases and tissue/cellular genome-scale metabolic models, we predict condition-specific altered lipidome for measurement and analysis. E. coli has been associated with obesity and metabolic syndrome and we used TDL to investigate its impact in a model gnotobiotic system. Applying TDL using the mRNA profiles, E. coli colonization was predicted to impact arachidonic acid metabolism, well-established for its contribution in low-grade inflammation and insulin resistance, and glycerophospholipid metabolism, multifactorially via, alterations in bile acids, availability of dietary lipids, inflammation and invasion. A microbiome-related therapeutic approach targeting these mechanisms may therefore provide a therapeutic avenue supporting maintenance of metabolic health.
Modelling Transcriptomics Data of the Developing Enteric Nervous System
Jens Kleinjung, Reena Lasrado, Vassilis Pachnis
The Francis Crick Institute, UK
The Enteric Nervous System (ENS) comprises in the order of 10^6 cells that perform sensory functions, control intestinal muscle movement and regulate enzyme secretion. The ENS develops from neural crest-derived cell progenitors that proliferate and differentiate to glia and neurons along with the expansion of the gastrointestinal tract during development.
This spatio-temporal self-organisation of the ENS to form a functional neural network that, despite a seemingly disordered distribution of neurons, is only marginally understood.
Using transcriptomics, in particular single-cell RNA-Seq data, of enteric cells at several developmental time points, we have studied their transcriptomic variability and characterised the cellular states (e.g. glial and neuronal) by means of marker genes.
The cross-sectional time series was further investigated by latent variable analysis, where the latent variable was either a defined biological process (like 'cell cycle') or the internal developmental time modeled as a 'pseudotime'. Pseudotime estimation was performed by means of Gaussian Process Latent Variable Modelling on the most divergent genes. The model has yielded ordering of single cells along the developmental pseudotime line and accordingly expression profiles of genes that correlate with lineage development in the ENS. The Bayesian model provides not only the most likely posterior distribution, but also the associated uncertainties of the estimates.
Comparing Ordinary Differential Equation and Rule Based Models of DARPP-32 signalling
Emilia M. Wysocka, Matthew Page, James Snowden and T. Ian Simpson
Institute for Adaptive and Neural Computation, University of Edinburgh, UK; UCB Celltech, UK; Biomathematics and Statistics Scotland (BioSS), UK
Modelling signaling networks is a complex task, but it can be facilitated using suitable formalisms. Rule-based (RB) modelling has been proposed as the most flexible method for stochastic modelling of site-specific protein interaction networks, and is far more expressive than traditionally applied Ordinary Differential Equations (ODEs). RB language represents proteins as site-graphs with internal states. This provides an expressive system to capture the principal mechanisms of signaling processes and can provide insight into mechanistic details of molecular interactions, such as the dynamics of post-translational modifications, domain availability, competitive binding, causality and conflicts in biological interactions (which are overlooked by concentration-based ODEs). When paired with a network-free simulator such as KaSim, RB language represents a highly efficient and expressive environment for model development. To compare both formalisms, we manually translated an existing ODE-based model of dopamine and cAMP regulated phosphoprotein (DARPP-32) signalling into a RB model. The model equations were rewritten into decontextualised reaction rules that reduced the model specification to 60% of its original size. Our novel RB model reproduced not only the basal dynamics of the original ODE-version but also the results of induced mutagenesis of the serine phosphorylation site. We demonstrate the advantages of RB model formulations by extending model analyses to the dynamics of complex formation and causality analysis. Finally, we present the results of global sensitivity analyses for this model and compare them to previously reported local ones.
FAIRDOM: supporting FAIR data and model management
Natalie Stanford, Katherine Wolstencroft, Stuart Owen, Finn Bacall, Olga Krebs, Quyen Nguyen, Rostyslav Kuzyakiv, Dawie Van Niekerk, Bernd Rinn, Jacky Snoep, Wolfgang Müller, Carole Goble and Martin Golebiewski
University of Manchester, UK; University of Leiden, NL; HITS gGmbH, DE; University of Zurich, CH; University of Stellenbosch, ZA
FAIRDOM's mission is to support researchers, students, trainers, funders and publishers by enabling Systems and Synthetic Biology projects to make Data, Operating procedures and Models, Findable, Accessible, Interoperable and Reusable (FAIR).
The data stewardship challenges in modern science laboratories are manifold. From a practical perspective, most projects involve the exchange of data between geographically distributed partners. Data from several labs may need to be integrated into a single model, with little input from the original experimentalists. Data and models from projects need to be packaged, and made available for supplementary material, or for long-term accessibility. Coupled to this, staff within labs undergo turnover. These issues can only be mitigated with suitable data and model management infrastructure, and carefully developed plans which are agreed and adhered to by all researchers involved.
At FAIRDOM we have spent nearly 10 years understanding the challenges involved in data and model management of interdisciplinary projects. We have used our understanding to advance infrastructure to support data and model management. We have helped to develop research into standardisation, annotation best practice, and social behaviour surrounding data and model sharing. All in order to offer better solutions for making data FAIR. Specifically for modellers, FAIRDOM offer infrastructure and expertise to assist with modelling (JWS Online), Data, SOP and Model Management (FAIRDOMHub with BiVeS integrated), and model reproducibility (integrated support of COMBINE archives, SBML, SED-ML, and in-publication reproduction of figures from models). We also support searching and linking to models stored in model specific repositories such as Biomodels.
The CD4+ T cell regulatory network mediates inflammatory responses during acute hyperinsulinemia: a simulation study
Mariana Esther Martinez Sanchez, Marcia Hiriart and Elena Alvarez-Buylla
Universidad Nacional Autónoma de México, MX
Obesity is linked to insulin resistance, high insulin levels, chronic inflammation, and alterations in the behavior of CD4+ T cells. Despite the biomedical importance of this condition, the system-level mechanisms that alter CD4+ T cell differentiation and plasticity are not well understood. We model how hyperinsulinemia alters the dynamics of the CD4+ T regulatory network, and this, in turn, modulates cell differentiation and plasticity. Different polarizing micro-environments are simulated under basal and high levels of insulin to assess impacts on cell-fate attainment and robustness in response to transient perturbations. In the presence of high levels of insulin Th1 and Th17 become more stable to transient perturbations and their basin sizes are augmented, Tr1 cells become less stable or disappear, while TGFβ producing cells remain unaltered. Hence, the model provides a dynamic system-level explanation framework and explanation to further understand thefor the documented and apparently paradoxical role of TGFβ in both inflammation and regulation of immune responses, as well as the and the emergence of the adipose Treg phenotype. Furthermore, our simulations provide novel predictions on the impact of the micro-environment in the coexistence of the different cell types, proposing suggesting that in pro-Th1, pro-Th2 and pro-Th17 environments effector and regulatory cells can coexist, but that high levels of insulin severely diminish regulatory cells, specially in a pro-Th17 environment. This work provides a first step towards a system-level formal and dynamic framework to integrate further experimental data in the study of complex inflammatory diseases.
Prediction of stem cell pluripotency using parallel single-cell transcriptome and methylome sequencing data
Soobok Joe, Hojung Nam
Gwangju Institute of Science and Technology, KR
To understand the molecular mechanism of cellular differentiation, research on the pluripotent stem cells has become of greater importance and many studies of stem cell are used to develop regenerative medicines. Studying of genome-wide epigenetic landscapes through DNA methylation offers significant insight into understanding heterogeneity of the pluripotent Embryonic Stem Cells(ESCs). Here, we constructed three predictive models of the pluripotency of single mouse ESCs using transcriptome and methylome sequencing data. The constructed models enable us to understand the extent to which each single ESC has more pluripotency or exists in a more differentiated status. In this study, 35 genomic loci were selected by elastic net regression approach, and those methylation levels were used in our predictive models. The Pearson’s correlation coefficients between the cell pseudo-time and the predicted pseudo-time are 0.92, 0.91, and 0.60 according to internal validations. The root mean squared errors and mean absolute deviations are in the range of 2.77 to 6.17, 2.07 to 5.13, respectively. In addition, compared to independent dataset of mouse ESCs grown in serum or in 2i medium, the AUC values are greater than 0.9 for all models. Our results show that exploring the epigenetic differences of each ESC can demonstrate different biological states.
ASSA‐PBN: a software tool for large probabilistic Boolean networks
Andrzej Mizera, Jun Pang, Qixia Yuan
University of Luxembourg, LU
Probabilistic Boolean networks (PBNs) is a well-established computational framework widely used for modelling, simulation and analysis of biological systems. The steady-state dynamics of PBNs correspond to the characteristics of biological systems and therefore, the steady-state analysis is of crucial importance for analysing a PBN. A vital challenge in the analysis of PBNs is how to deal with the infamous state-space explosion problem for large PBNs which often arise in systems biology. In this poster, we will present ASSA-PBN, a software toolbox for modelling, simulation, and analysis of large PBNs. ASSA-PBN supports two different ways for analysing the long-run dynamics of a PBN: computation of steady-state probabilities and attractor identification in its constituent Boolean networks (BNs). For the first, ASSA-PBN provides a number of statistical methods together with efficient parallel techniques to speed up the steady-state probability computation. For the second, ASSA-PBN focuses on each of the constituent BNs of a PBN and applies the divide-and-conquer principle to deal with the state-space explosion problem. Based on the structure of a BN, ASSA-PBN divides a large BN into several sub-networks, detects attractors in sub-networks and reveals the attractors of the original BN in the end. Moreover, ASSA-PBN employs particle swarm optimisation and differential evolution for parameter estimation to optimise PBNs. In addition, it integrates various in-depth analysis techniques of PBNs, such as long-run influence analysis, long-run sensitivity analysis, and computation and visualisation of one-parameter profile likelihoods. Experimental results show that ASSA-PBN can indeed deal with real-life biological networks of thousands of nodes.
A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-treated Atlantic Cod (Gadus morhua) Liver
Xiaokang Zhang, Inge Jonassen
Computational Biology Unit, Department of Informatics, University of Bergen, NO
Biomarker discovery is extraordinarily important in gene expression analysis in context of toxicant exposure. Among gene selection methods, differential expression analysis is often applied because of its simplicity and interpretability. But it treats genes individually, disregarding the correlation between them. So some multivariate feature selection methods are proposed for biomarker discovery. We compared three methods that stem from different theories, namely Significance Analysis of Microarrays (SAM) which finds out the differentially expressed genes, minimum Redundancy Maximum Relevance (mRMR) based on information theory, and Characteristic Direction (GeoDE) from a geometrical aspect, according to the stability and classification accuracy.
The stability of feature selection methods is measured based on the overlap of selected features from different sampling steps. Using the subsets of selected features from 3 feature selection methods, we trained 4 classifiers, namely Random Forest, Support Vector Machine, RIDGE regression, LASSO, and then test the prediction accuracy to see how well the subsets can improve it. Based on these two aspects, we studied the performance of 3 feature selection methods. Tested on the gene expression data from two toxicant exposure experiments on Atlantic Cod liver, we found that GeoDE is more stable, and can give higher prediction accuracy in low-dose condition.
Parameter fitting facility for rule-based models helps to analyse clathrin polimerization mechanisms
Oksana Sorokina, Anatoly Sorokin, Katharina Heil and JD Armstrong
The University of Edinburgh, UK; Institute of Cell Biophysics, RU
Rule-based modelling is a relatively new but highly flexible method for developing Systems Biology models. Rule-based languages, such as Kappa or BNGL provide semantics for the compact and compositional description of complex biochemical and signalling networks. However, lack of user tools is a barrier to more widespread adoption of the method. RKappa provides support for parameter exploration and global sensitivity analysis framework for the Kappa language (1). Here we extend it with model-fitting infrastructure combining RKappa simulation platform with Nelder-Mead, genetic and particle-swarm algorithms to minimize the difference between simulation results and experimental data. We demonstrate its application in a model of clathrin polymerization by identifcation and analysis of optimal parameter sets for formation and dissociation of various cage structures.
Clathrin is the major component of clathrin-mediated endocytosis. Due to its particular geometry and (auto-) polymerisation capacity, when activated, clathrin forces the cell membrane to adopt a vesicular shape. The process is believed to be initiated by a range of different triggers, which forces further molecules to assemble on the extra- and intracellular side of the membrane so that ~30 proteins directly participate in the regulation of different steps of endocytosis . Cage formation likely starts from a flat raft buildup on the membrane, which is later transformed into a curved structure (2). Creation of the curved surface requires rearrangement of raft by additional molecular mechanisms.
We created a rule-based model, describing polymerisation of the clathrin and various scenarios of cage formation, and demonstrate that the model can reproduce budding of the cage from the initial flat raft. With our optimisation facility we obtained parameter sets for flat raft formation, pentagon closure and closed cages of large size. We have analysed the obtained parameters from biophysical point of view and proposed mechanisms, which could cause switching between the stages of vesicle formation. Further, our model is extendable and therefore able to serve as a framework point for a larger, biologically meaningful mechanistic model.
- A. Sorokin, O. Sorokina, J.D. Armstrong, RKappa: Statistical sampling suite for Kappa models, in: O. Maler, A. Halasz, C. Piazza (Eds.), Hybrid Systems Biology, Springer, 2015: pp. 128–142. doi:10.1007/978-3-319-27656-4.
- O. Avinoam, M. Schorb, C.J. Beese, J.A.G. Briggs, M. Kaksonen, ENDOCYTOSIS. Endocytic sites mature by continuous bending and remodeling of the clathrin coat, Science. 348 (2015) 1369–1372. doi:10.1126/science.aaa9555.
Sparse regression modeling of drug response with a localized estimation framework
Teppei Shimamura, Hideko Kawakubo, Yusuke Matsui
Graduate School of Medicine, Nagoya University, JP
A major challenge in pharmacogenomic studies is differences in the clinical characterization of patients and their reactions, which makes it difficult to identify clinically meaningful gene-drug interactions and predict drug response for each patient. In this study, we consider a localized regression model for each sample to predict a drug response with a set of main effects and second-order interactions for oncogenic alterations for patients. We propose a sparse modeling of interactions with localized estimation framework (SMILE) for this task. We take a regularization approach to inducing strong hierarchy in the sense that an interaction coefficient can have a non-zero estimate only if both of corresponding main effect coefficients are non-zero. We incorporate two different constraints into the regularization problem, called the network lasso and the exclusive lasso, to enhance sample-wise similarity and group-wise sparsity in the model, which enable to generate an interpretable localized model for each sample. It can be formulated as the solution to a convex optimization problem, which we solve an efficient iterative least-squares method. We then demonstrate the performance of our proposed method in a simulation study and on a pharmacogenomic data set.
Agent-based Modeling of Hyaluronan in Vocal Fold Wound Repair
Jalil Nourisa, Samson Yuen, Nuttiiya Seekhao, Luc Mongeau and Nicole Y. K. Li-Jessen
McGill University, CA; University of Maryland-College Park, USA
Human vocal folds (VFs) are pliable soft tissue enriched with hyaluronan (HA). HA acts as a shock absorber in high-frequency phonation to prevent tissue fatigue and provide a shelter for resident cells. In addition, HA modulates a wide range of cellular activity such as proliferation and collagen production for tissue homeostasis and repair. Cellular activities modulated by HA are dependent on the origin of HA (native versus exogenous), molecular weight of HA, stiffness of the surrounding extracellular matrix (ECM), as well as the time exposured to HA. The primary goal of this study is to develop a computational agent-based model (ABM) to simulate the dynamics of HA and cells in vocal fold wound repair. The model is composed of fibroblasts as agents exhibiting behaviors of migration, proliferation, ECM production, myofibroblast transdifferentiation and autocrine TGF-b1 signaling. The stimuli for cell signaling consists of TGF-b1, HA, ECM stiffness, and cell density. The model was calibrated and validated with published data in the vocal fold literature. We also tested hypothetical cases of blocking the endogenous secretion of HA and injection of exogenous HA to study the effects on fibroblast activities and wound healing outcomes. The results from our model are consistent with the literature in terms of the spatial and temporal patterns of cell infiltration into wound site as well as the trajectory of collagen accumulation during acute inflammation and repair. Our ABM represents a potential computational tool for testing growth factor therapies such as TGF-b1 treatments as well as optimizing the design of HA-based hydrogel to regenerate injured or scarred vocal fold.
Boolean Modeling of Breast Cancer Cell Lines using Logic Programming
Misbah Razzaq, Carito Guziolowski
Ecole Centrale de Nantes, FR
Protein networks are not static in nature as proteins always work with different proteins. They go through many biochemical modifications such as ubiquitination and phosphorylation to propagate signals. Understanding the mechanism and interactions among proteins helps to know exactly how information propagates within cells in different diseases such as cancer. Large-scale protein to protein interaction datasets contain a large number of experimentally verified proteins but lack the knowledge of how network changes in different cell lines. Boolean Network is a powerful framework to study and model the dynamics of the protein networks. The knowledge about these dynamics may help biologists to design more efficient drugs by having a better understanding of the underlying system. Caspo time series is a tool based on logic programming to infer family of Boolean models from phosphoproteomics time series data and prior knowledge network. In this work, a family of boolean models of four breast cancer cell lines is constructed using caspo time series method. Then, model checking is applied to confirm the resulting boolean networks. Moreover, boolean models of each cell line are analyzed under different perturbation to identify commonalities as well as discrepancies. To further validate our results, boolean models are compared with the canonical pathway.
Formal modelling of multi-agent interaction dynamics in biological systems
Thomas Wright, Ian Stark
The University of Edinburgh, UK
We introduce a new mathematical framework for modelling biological systems at the molecular, cellular, and ecosystem level by combining process-algebra models that describe the behaviour of individual agents with an affinity network that specifies the dynamics of their interactions. We complement rule-based methods in focusing primarily on agent capabilities, from which reaction networks emerge as they interact. Our models generate conventional ODEs for behavioural simulation, but from a precise higher-level expression of the structures and mechanisms that give rise to that behaviour and are often left to informal description.
This process-algebra approach is well-established in computer science for modelling networks of communicating agents by capturing not only the behaviour of individual agents but also their capability for mutual interaction. A. Regev and others have applied this to modelling a wide variety of biological systems, among which we build upon the work of Kwiatkowski, Banks, and Stark in linking process-algebra descriptions of agent interaction with differential equations that model their dynamics.
We apply our framework to V. A. Kuznetsov's classic model of immune response to tumour growth, and present advances on previous work in capturing complex features including nonlinear interaction dynamics, n-party interactions, and dynamic binding of agents to form new agents. We show agent behaviour in isolation and when brought together, validating the observed dynamics of the model against our conceptual understanding of immune action.
An FPGA-Based framework for simulation and analysis of Boolean gene regulatory networks
Mitra Purandare, Raphael Polig
IBM Research Laboratory Zurich, CH
The ability of Boolean networks to cope with unavailability of kinetic parameters like growth rate, decay rate etc., makes them popular for modeling a gene regulatory network (GRN). A Boolean network abstracts away the quantitative details of a GRN and preserves the GRN's behavior only as Boolean values ('0' and '1'). Despite the abstraction of quantitative details, a Boolean model can provide important insights into a GRN's dynamics. Identification of cycles of states, in particular, attractors is still possible with Boolean models. An attractor is a cycle of trap states A such that no trajectory starting in A can leave A. Attractors of Boolean networks are useful in understanding of observed phenotypes of GRNs. In literature, attractors are shown to be associated with cell cycle phases. Steady states correspond to differentiation or apoptosis in adult tissue cells. Analysis of attractors can also provide new information about origins of genetic diseases like cancer. We propose a framework for accelerated simulation and analysis of Boolean networks. Our framework uses field programmable gate arrays (FPGAs), leveraging their highly parallel nature for simulation and analysis of Boolean models. Our framework supports synchronous and asynchronous simulation of BNs and efficiently identifies attractors in the simulated state space. Compared to Boolnet, our framework demonstrates up to 1 and 3 order of magnitude speed-up for simulation and attractor analysis of Boolean models, respectively.
Accelerating construction of integrative kinetic models of metabolism with GRaPe
Chuan Fu Yap, Jean-Marc Schwartz and Simon Hubbard
University of Manchester, UK
The construction of kinetic models of metabolic pathways has always been hindered by the limited availability of kinetic parameters, in addition to incomplete knowledge for many of the reaction mechanisms. Strategies have been developed to allow the generation of kinetic models with limited information. Despite this, not many large-scale dynamic and integrative models have been generated. The aim of this research is to streamline the process of generating large-scale metabolic models, while using metabolomic, proteomic and fluxomics to inform parameter values. The use of aforementioned integrative data in model construction would greatly enhance the parameter estimation process, reducing redundancy in parameters and thereby increasing the model’s predictive capability.
Previously, we developed the GRaPe tool in order to streamline the construction of metabolic models through automated generation of kinetic equations. However a number of limitations affected the performance of the tool, which are now being addressed in this project. Firstly, convenience kinetics developed by Liebermeister and Klipp has been introduced to replace the previously used reversible Michaelis-Menten rate equations. Convenience kinetics requires fewer parameters, which reduces the burden on parameter estimation for the models. Additionally, it allows for inclusion of enzymatic modifiers into model building. Secondly, parameter estimation was performed locally on each reaction, which has now been replaced by global parameter estimation for the system as a whole. Thirdly, the parameter estimation was skewed to favour flux values at steady state, which resulted in limited applicability of the models generated. The fitness measurement in the genetic algorithm used for parameter estimation has been updated to account for metabolite values as well flux and protein values, improving the model fitting to dynamic series of experimental data.
As a proof of concept, a model of yeast glycolysis is being built using flux values, and protein amounts obtained in a heat stress experiment. We show that our method is capable of generating a dynamic model, which accounts for both types of data. This work will be extended towards providing a general platform for the integration of multiple omics data sets and expedited construction of large-scale kinetic models.
Spatial constraint and the interaction between lytic phages and biofilm-dwelling bacteria
Matthew Simmons, Vanni Bucci, Carey Nadell
University of Massachusetts, Dartmouth, USA; Max Planck Institute for Terrestrial Microbiology, DE
One of the dominant modes of life in microbial ecology is the development of complex bacterial communities that clump together on a surface, forming a biofilm. Bacteriophage are viral pathogens capable of attacking individuals of those communities. However, we have limited understanding of how the ecological dynamics of phages and biofilm-dwelling bacteria depend on the biological and physical properties of the biofilm environment. Exploring all potential parameter combinations leading to well-defined ecological outcomes is an daunting problem, so to make headway in this area, here we develop the first biofilm simulation framework that captures key features of biofilm growth and phage infection. Using this fully extensible framework, we first investigate the primary determinants for the interactions between phage and bacteria using massive parameter combinations. We find that the equilibrium state of interaction between biofilms and phages is governed largely by nutrient availability to biofilms, phage infection likelihood, and the ability of phages to diffuse through biofilm populations. Interactions between the biofilm matrix and phage particles are thus likely to be of fundamental importance, controlling the extent to which bacteria and phages can coexist in natural contexts. To further investigate the ability of phage to penetrate the biofilm, we show that using a protein synthesized by infected bacteria to digest the biofilm matrix could be an effective measure to penetrate the polysaccharide matrix. Our framework’s results build on the rich literature of phage-bacteria interactions, and enable us to explore previously unopened avenues in microbial life in spatially structured environments.
Fast, easy, interoperable and reusable - the cobrapy infrastructure for modeling metabolic flux
Henning Redestig, Moritz Beber, Danny Dannaher, Nikolaus Sonnenschein, Peter St. John, Christian Diener, Kristian Jensen and Joao Cardoso
Technical University of Denmark, DE; NREL, USA; National Institute of Genomic Medicine, MX
Constraint based reconstruction and analysis (COBRA) is widely used to interpret and predict the interplay between genotype and metabolic fluxes. The community-developed cobrapy Python package implements functionality to read, write, edit, and adjust COBRA models and to perform simulations using numerous popular algorithms. Since the first releases in 2012, the cobrapy project has gained considerable attention thanks to its broad feature-set with extNational Institute of Genomic Medicineensive documentation, and to being free/open source software without any non-free dependencies. The core classes of cobrapy form a basic infrastructure for constraint-based modeling that is easy to reuse in other packages, facilitating the development of new functionality as well as increasing potential interoperability between packages. In order to simplify the implementation of new algorithms, we have drawn on our experiences with the early versions of cobrapy, the development of the strain design package cameo, and the mathematical modeling package optlang, to greatly enhance the cobrapy core classes. Interaction with the software that actually solves the mathematical problem in the cobra model is now provided by optlang, which greatly facilitates the implementation of new COBRA algorithms and encourages the contribution of new algorithms from the research community. Several simulation algorithms have already been refactored for increased readability and performance. Here, we present the new functionality and outline the way forward for the role of cobrapy as a freely available infrastructure package for efficient constraint-based modeling in python.
Identifying novel enzymes for metabolic engineering using homology based sequence search
Merja Oja, Sandra Castillo, Gopal Peddinti
VTT Technical Research Center of Finland. FI
Synthetic biology involves engineering a biological host to achieve a desired function by introducing novel pathways or metabolic enzymes. We have developed a bioinformatics pipeline using homology based database searches for the identification of novel candidate enzymes. The homology based sequence searches are performed against public (Uniprot, TrEMBL, Genbank nucleotide and protein databases), and in-house nucleotide databases. Sequences with high similarity to the query sequence are extracted, and the resulting sequences are analysed based on multiple sequence alignment and phylogentic trees. This pipeline has been successfully utilized in several applications including the identification of novel isoprene synthases. We are currently developing ranking approaches-based on available database annotations (e.g. structure, protein family and function), cofactor utilization, subcellular localization, phylogenetic distance between the hosts etc.,-to aid the user in selecting the best candidate hits.
Deep learning models of bifurcating developmental journeys from single-cell transcriptomic data can make accurate predictions of gene perturbations
Wajid Jawaid, Berthold Gottgens
University of Cambridge, UK
Measurement of mRNA molecules at single-cell resolution from either qPCR or RNAseq experiments potentially allow for assessment of cell-cell heterogeneity, post-hoc assignment of cell populations, pseudotime inference and generation of gene regulatory networks. Various models have been used where gene interactions are often explicitly determined using variety of metrics taking into account available temporal information.
Herein I introduce a deep neural network to learn the underlying relationships between genes to determine cell trajectories in a bifurcating biological system where gene interactions are not explicitly specified. The embryological mammalian developmental haematopoiesis system has a known bifurcation point where a putative progenitor generates two divergent cell types: embryonic blood and endothelial cells. Moignard et al.(2015) captured cells before and after this decision point collecting cells at multiple time-points from mouse embryos. These populations were sorted to enrich for cells under study and then underwent qPCR measurement for 42 genes known to be involved in embryonic haematopoiesis.
Pseudotime ordering on the cells using diffusion map dimensionality reduction displayed the bifurcation point. The ordered data were now considered to consist of multiple single-cell transitions from one 42-gene cell state to another, respecting the branching structure. A feed forward deep neural network was trained on these transitions by back propagation. Cell states were de-novo simulated from the network with Gaussian random noise and projected onto a PCA representation of the data.
The model simulations were validated by demonstrating that in-silico perturbation of a master regulator of blood development led to failure of the model to generate blood cells, accurately phenocopying that which has been experimentally established.
In conclusion deep neural network models can simulate biological systems and in doing so, can implicitly learn a gene interaction structure that can accurately predict experimental observations. It is clear that given the limited amount of data such predictions cannot be universally correct but where such predictions are inaccurate further experimental data can be added to refine the model. We propose this as a basis for more sophisticated models that can serve as flexible predictive engines allowing screening of transcriptomic outcomes from multiple combinatorial perturbations that would be difficult and expensive to perform experimentally.
Sparse Regression for Network Graphs and Its Application to Gene Networks of The Brain
Hideko Kawakubo', Yusuke Matsui, Teppei Shimamura
Nagoya University Graduate School of Medicine , JP
Recent rare variant analyses of single nucleotide variations (SNVs) and copy number variations (CNVs) has identified dozens of candidate genes that may contribute to neurogenetic disorders such as autism and schizophrenia. However, it is unclear whether and how these disease-causing genes are associated with cellular mechanisms in brain. This problem is a challenging task, since the brain contains hundreds of distinct cell types, each of which has unique morphologies, projections, and functions, and thus disease-causing genes may contribute to different behavioral abnormalities of distinct cell types in the nervous system. In order to identify candidate cell types of the brain related to a complex genetic disorder, we propose a statistical method, called cell-type specific gene network analysis. The aim of our approach is to construct a sparse regression model where responses are binary labels indicating whether genes are associated with a specific neurogenetic disorder or not and predictors are graphs representing the cell-type specific gene regulatory networks of various regions of the human brain. Our method measures the dependencies between responses and graph predictors using the Hilbert-Schmidt Independence Criterion (HSIC), and maximizes these dependencies with a mixed L1 regularization of the Lasso and the Fused Lasso penalty. The Lasso penalty enables a sparse estimation of regression coefficients, while the Fused Lasso penalty strongly encourages coefficients of highly correlated cell types to be associated. Through simulations and experiments with real data, we demonstrate the performance of our proposed method.