RBC Metabolic Network
The goal of the project is to generate realistic data sets that can be used for benchmarking in reconstruction of metabolic networks from high throughput metabolite profiling data, and then to validate minimally modified transcriptional reverse engineering methods for applications in the metabolic domain. The details are reported in Nemenman et al., 2007a.
In this project, we generated four synthetic data sets representing metabolism of red blood cells (RBCs) using as a starting point the RBC Mathematica workbook from
- Jamshidi, N., et al., Dynamic simulation of the human red blood cell metabolic network. Bioinformatics 17, 286-287, 2001. PDF.
The rationale for choosing these particular data sets is provided in Nemenman et al., 2007a.
Contents
Overview
The RBC Mathematic workbook model has 5 parameters that can be controlled externally: the Donnan ratio, R, (which determines the difference in the pH inside and outside of the cell); glucose concentration, G; total intracellular magnesium concentration, both free and bound, Mg; intracellular inorganic phosphorus concentration, Pi; and extracellular (plasma) sodium concentration, Na. For the first three data sets, these external control parameters were sampled at random 1000 times from specified probability distributions, representing different experimental setups, and the steady state values of the metabolic network were found by using the methods in the RBC workbook. In a significant number of situations (up to 30% or more depending on the data set), the randomly selected parameters did not lead to steady state solutions. These samples were removed from the data set.
For each data set, the data consists of two files: (1) values of control parameters, and (2) values of responses. Within each of the files, the columns correspond to the values of the same chamicsl species, and the rows correspond to either the same steady state, or the same point in time. The control parameters are listed in the following order: RT, GLC, MGT, PI, NAE. The responses are: G6P, F6P, FDP, DHAP, GAP, DPG13, DPG23, PG3, PG2, PEP, PYR, LAC, NADH, GL6P, GO6P, NADPH, GSH, RU5P, R5P, X5P, S7P, E4P, ADO, AMP, ADP, ATP, PRPP, IMP, INO, HX, R1P, ADE, NAI, KI, MGATP, MGADP, MGAMP, MGDPG23, MG. Please review the Jashmidi et al. paper above for the explanation of the notation.
You can download the list of reactions and metabolite-metabolite interactions. The same information is available in terms of the 0/1 adjacency matrix (this is in Octave file format, and also includes additional information, like metabolite and reaction names). Note that the adjacency matrix is not symmetric, and the (ij) entry represents the control of i'th metabolite by j's metabolite. For validation purposes, we symmetrize the matrix as (in Matlab notation, with adj standing for the matrix): adj=(adj+adj')>0.
Data sets
- Data set 1 (chemostat)
- Files: Parameters, Responses. 1000 parameter draws with corresponding responses.
- Notes: This set simulates RBC steady-state measurements from chemostat culture experiments. All the parameters are uncorrelated, uniformly distributed variables, with the ranges: RT = 0.2…1.6, GLC = 2.0…30.0 mM, MGT = 0.1…20 mM, PI = 0.6…1.8 mM, NAE = 100…200 mM.
- Data set 2 (natural)
- Files: Parameters, Responses. 1000 parameter draws with corresponding responses.
- Notes: This set represents the variability of RBC metabolite concentrations in blood samples from healthy humans. The control parameters are taken as uncorrelated normal variables with means standard deviations: RT = 0.75 0.1, GLC = 5 0.6 mM, MGT = 3.3 0.2 mM, PI = 0.9 0.15 mM, NAE = 140 2.5 mM.
- Data set 3 (correlated)
- Files: Parameters, Responses. 1000 parameter draws with corresponding responses.
- Notes: This data set models the in vivo metabolite concentrations more faithfully by incorporating physiological correlations among the controls. We summarized the data into correlation coefficients of 0 (no trend, or no data available), 0.3 (weak correlation), and 0.5 (strong correlation), while the variances are as in Data Set 2.
RT | GLC | MGT | PI | NAE | |
---|---|---|---|---|---|
RT | -0.3 | -0.5 | +0.5 | -0.3 | |
GLC | +0.3 | -0.5 | +0.3 | ||
MGT | 0 | -0.3 | |||
PI | -0.3 | ||||
NAE |
- Data set 4 (evolving parameters)
- Files: Parameters, Responses (Caution: Large file). 7201 points of temporal evolution of parameters and responses.
- Notes: The RBC model takes up to 100 hours or more to reach a steady state. However, in a natural environment, the control parameters fluctuate on time scales less than an hour. In this data set, parameters are modeled as correlated Ornstein-Uhlenbeck processes with the means, the standard deviations, and the species-species correlations as in Data set 3, and the correlation time of 20 min for each process. The resulting time series data represent 20 hours of evolution of the RBC model, sampled every 10 seconds (for a total of 7201 samples).
Noisy data
Real experimental measurements always come with noise. To this extent, the responses above were injected with the noise with the variance , where and are various constants, and is the value of each particular response abundance. Data sets are available below. Notice that the data with , have no noise and repeat the data sets above. Most of the noisy data sets used in the published paper are downloadable below. Similar data sets can be generated easily with the scripts provides at the bottom of the page.
The noisy data files are provided in the Matlab file format and include the following variables:
- noisy -- noisy metabolite concentrations (1000 steady states for the first 3 data sets, and 6400 measurements for data set 4)
- noise_var -- noise variance for each of the metabolites
- tot_var -- total observed variance for each of the metabolites
- network_pruned -- true metabolite adjacency matrix (for network reconstruction validation)
- adj -- matrix of mutual informations among metabolites, evaluated by the ARACNE algorithm
- h_opt -- the optimal kernel width parameter for ARACNE
- tol -- the DPI tolerance parameter for ARACNE
- precision and recall -- precision and recall curves for all tolerance values in tol
Note that, as noise increases, the signal variance for some metabolites becomes smaller than the noise. These metabolites are dropped from the network, and the network adjacency matrix is then rearranged to account for the transformation. More details are available from Nemenman et al., 2007a.
A\B | 0 | 1e-4 | 1e-3 | 1e-2 | 1e-1 | 3e-1 |
---|---|---|---|---|---|---|
0 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 |
1e-4 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 |
1e-3 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 |
1e-2 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 | Sets 1, 2, 3 |
1e-1 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 | Sets 1, 2, 3, 4 | Sets 1, 2, 3 |