Model internal specification

To meet our internal requirements, we are storing models as TAB-delimited files according to the following specifications. Apart from the header lines, the Excel spreadsheets we are providing follow exactly the same syntax.

SBML files are also available and should allow data exchanges with other Web sites and tools.

File/Sheet name -- Column content

model

  1. The name of a property of the model [STRING]. Currently supported names are:
    • ID - the identifier for the model (mandatory)
    • Description - a short description for the model
    • MNXref Version - MNXRef version used at mapping time
    • Source ID - the original name of the model, before its mapping to the MNXRef namespace
    • Processed Date - the date at which the model has been imported in MNXRef
    • Taxid - the NCBI taxonomic ID of the organism related to this model
    • LB - Global Lower Bound
    • UB - Global Upper Bound
    • Organism - Organism full name
    • Lineage - Organism lineage
    • Proteome ID - Proteome identifier used for peptide mapping
  2. The value for this property [STRING]

reactions

  1. The identifier of the compartmentalized and directed reaction [ID]
  2. The equation that fully describes the compartmentalized and directed reaction [REAC]
  3. The original identifier of the reaction in the model [EXT_ID]
  4. The identifier for the uncompartmentalized and undirected reaction, if known [MNX_ID]
  5. The EC number(s) associated to that reaction [STRING, optional]
  6. The pathway(s) associated to that reaction [STRING, optional]
  7. The cross-reference(s) associated to that reaction [EXT_ID, optional]

chemicals

When feasible, we strongly advise the users to utilize the same identifiers as defined in the MetaNetX repository. The special identifier BIOMASS must be used to specify the biomass.

  1. The identifier of the compound as it appears in the reaction equation [ID]
  2. The name of the compound, or a short description [STRING]
  3. The original identifier of the compound [EXT_ID]
  4. The chemical formula of the compound [STRING, optional]
  5. The mass of the compound [REAL, optional]
  6. The charge of the compound [INTEGER, optional]
  7. The cross-references for the compound provided in the original model [EXT_ID, optional]

compartments

When feasible, we strongly advise the users to utilize the same identifiers as already defined in the MetaNetX repository. The special identifier MNXDX must be used when the compartment is not known. The identifier BOUNDARY is reserved to specify the model boundary.

  1. The identifier of the compartment as it appears in the reaction equation [ID]
  2. The name of the compartment, or a very short description [STRING]
  3. The original identifier of the compartment (its identifier before the mapping to the MetaNetX namespace) [EXT_ID]

enzymes

Additional information about the reactions are given here. It could be the description of the protein complex that catalyzes the reaction, if known. Also, the lower and upper bounds for the flow on the reaction can be specified. The direction can be B, LR or RL for bidirectional, left to right or right to left respectively.

  1. The identifier for the compartmentalized and directed reaction [ID]
  2. The list of proteins that form enzymes that can catalyze that reaction [ENZS]
  3. The flux lower bound for this reaction [REAL]
  4. The flux upper bound for this reaction [REAL]
  5. The direction for this reaction [STRING]

peptides

Additional information about the peptides used to form enzymes are given here, if known.

  1. The identifier for the peptide/gene [ID]
  2. The peptide/gene description, if known [STRING]
  3. The list of cross-references for the peptide/gene, if known [STRING]
  4. The list of peptide/gene names, if known [STRING]

properties

The results of the different model-specific analysis.

  1. The scope of the line, i.e. one of analysis, prop, spec or reac [STRING]
  2. The identifier for the target object [ID or STRING]
  3. A key [STRING]
  4. A value [STRING]



Data types

ID

A reference identifier that satisfies the following requirements - It must be made of letters, digit and underscore only - Its length should no exceeds 64 symbols - It cannot start with a digit

The resemblance with the SBML definition for Sid is not fortuitous.

In addition, the following recommendations holds - The identifiers used in the MNXref namespace for chemical compounds, cellular compartments and (undirected, uncompartmentalized) chemical equations follow these specifications [MNX_ID] - For compartmentalized and directed reactions, identifiers are automatically constructed by concatenating (i) the prefix R; (ii) the CRC32 checksum of the compartmentalized but undirected equation of the reaction. - BIOMASS is the reserved identifier used to represent the biomass, which is regarded as a kind of chemical compound - BOUNDARY is the reserved identifier used to represent model boundaries, which is regarded as a kind of compartment - MNXDX is the reserved identifier used when the cellular compartment is actually unknown - During the importation of user's model and during the update of the MetaNetX repository, the identifiers of chemical compounds and cellular compartments that cannot be mapped to the MNXref namespace, are arbitrarily transformed to meet the above specifications, if needed.

EXT_ID

An external resource identifier. It is usually made of a database identifier, followed by ':', followed by an entry identifier or an accession number. For example: chebi:1234

REAC

The equation of a compartmentalized reaction, written according to the following rules: - The name of every compartmentalized chemical species is composed of the identifier of a chemical compound joined with a '@' to the identifier of the compartment. - Reactants are separated with the '+' symbol. The same for products. - Both integer and real stoichiometric coefficients are supported. For the special case of (de)polymerization reactions, non-numeric stoichiometric coefficients are also allowed but they must represent a mathematical expression and be placed between parentheses (for example "(n)" or "(2*n+1)"). Note that these non-numeric stoichiometric coefficients would automatically be replaced by numeric ones in some analysis requiring numeric stoichiometric coefficients only (like the FBA). - The direction of the reaction is indicated by an arrow that separates the substrates from the products. We use the following symbols: --> for reactions from left to right; <-- for reactions from right to left; <==> for reversible reactions.

ENZS

A list of enzymes able to perform a given reaction. Enzymes are separated by ';' and within an enzyme, proteins that form that enzyme are separated by '+'. The number of subunits of each protein is indicated by an integer followed by '*'. Nota bene: The syntax of this field is likely to be updated in the near future!

STRING

A string of characters, that does not contain TAB, CR or LF

REAL

A real number

INTEGER

An integer number (possibly negative)