Introduction

NeuroMMSig Introduction Why is different? Biological Expression Language (BEL) Methodology Enrichment algorithm Mechanisms

What is NeuroMMSig?

Multimodal Mechanistic Signatures Database for Neurodegenerative Diseases (NeuroMMSig) is designed to allow users to retrieve candidate mechanisms, represented as chains of cause and effect graphs, that fits best to any pattern of experimental data (e.g., gene or SNP set, or a list of imaging features). NeuroMMSig has also been enriched with drug information offering feasible drugs that could be a target for the proposed mechanisms. NeuroMMSig integration of different data scales allows to find the most meaningful mechanisms which suit or better explain the experimental data. This can lead to patient stratification based on data and personilized medicine based on mechanism identification.

Introduction

NeuroMMSig offers a web interface where users can submit data to infer mechanistic signatures in the context of neurodegenerative diseases. NeuroMMSig allows submission of multiscale data from molecular to clinical level to return mechanisms that fit best the data. The branding of NeuroMMSig is inspired from Molecular Signatures Database (MSigDB), and the models underlying the server are coded in the Biological Expression Language (BEL).

How NeuroMMSig is different from Pathway Databases

What is BEL?

A short introduction to BEL can be find here link. BEL is a language especially designed to represent scientific knowledge in a computable form by capturing causal and correlative relationships in context. In the neurodegenerative disease field, BEL is able to store additional information such as which of relationship exists between the biological entities acting, evidences supporting this relationship in the literature and many other specific annotations such as experiment conditions. Besides, BEL facilitates integration of multiple data types through its flexible and human readable syntax. Therefore, we found BEL ideal to build the models that made the core of NeuroMMSig.

Methodology

The methodology can be grouped into two main sections: Annotation of the mechanistic subgraphs and enrichment ranking algorithm. The first section describes in detail how the manual crafting of the mechanistic subgraphs was performed and the second how the enrichment algorithm works. Both sections are illustrated with examples.

Manual Crafting of the Mechanistic Subgraphs

The aim of the exercise is to generate an inventory of subgraphs, a computable network coded in BEL, that represent and comprise the knowledge about well-established mechanisms or hypotheses involved in the condition. The inventory will support us to better interpret, delineate, and explore the knowledge around disease-specific pathways or mechanisms since we are not looking at the whole vast of knowledge but focusing on a specific part of the disease. Since these subgraphs are computable, they can be merged, modified, enriched, or algorithm can be run on them in order to explore and analyze these hypothese. More detailed information about this part can be found here.

Generating an inventory of disease-related pathways/mechanisms/processes for annotation

First, we needed explore the landscape of disease-related pathways or mechanisms in order to annotate the knowledge assembly with the knowledge from existing terminologies. Having this set of terms also helps us in determining the boundaries of each pathway or mechanism because we can search in the literature whether the the entities in each triplet are involved in each pathway or mechanism. The procedure described next, focuses on the context of Alzheimer's disease (AD) but it can be applied to other conditions (e.g., Parkinson's disease, epilepsy).

  1. Using SCAIView, we have extracted the Alzheimer's disease related pathways using the query: [MeSH Disease:"Alzheimer Disease"] and show results in the (Pathway Terminology System) 'PTS' terminology. The resulting query extracted pathway terms described in the context of Alzheimer's disease from over 100000 articles in MEDLINE. Following, a list of approximately 900 terms was exported.
  2. The next step involved manual curation as well as an enrichment on this primary list. Manual curation was required address the following issues:
    • Remove false positives (e.g., "Melanogenesis")
    • Wrong entries (e.g., "No mapping")
    • Duplicates (e.g., "amyloidogenesis" and "amyloid-beta peptide pathway")
    • Synonyms pathways/processes/mechanisms were grouped together into one consensus term (e.g., "amyloidogenesis" and "amyloid-beta peptide pathway" was labelled as "Amylodogenic pathway").

After collecting the set of terms that are used to describe the pathways and mechanisms in AD, we used this set as an inventory for the annotation of the knowledge assembly together with other pathway repositories and disease specific literature. However, during the curation process, new disease-specific pathways/mechanisms were found while reading literature and inspecting the AD Knowledge Assembly so they were included and curated into the inventory. Therefore, this was an iterative process of curating the inventory parallel to the annotation of the knowledge assembly.

Manual Curation and Annotation of the Knowledge Assembly into Mechanistic Subgraphs

Mapping biological entities to mechanisms

In order to map an entity with its corresponding disease-specific mechanisms (Note that an entity might be part of multiple mechanisms), we would first read the corresponding evidence in order to search for insights about possible mechanisms that might be described in the evidence itself. Next, with the help of the evidence/context we would query the literature to find supporting evidences in the case we have already in mind a mechanism that might be related with the entity or simply search what is its role in the condition. This would involve, for example, using text mining resources or search engines (e.g., SCAIView, PubMed, Google) and querying the entity name (e.g., gene/protein name) together with the name of the condition. The results of the queries will pinpoint publications that describe associations between the entity and the condition (if exists). By reading these documents insights about the role of the entity in pathophysiological mechanisms can be gained.

In this case, PubMed identifiers of the documents can be used as references that support this mapping. If the queries do not provide any insight about possible mechanisms in which the entity might be involved in, dedicated databases (e.g., Reactome, UniProt for proteins, CHEBI for chemicals) can be queried to complement the search. They might provide valuable information about the role of the entity in pathways related with the disease as well as supporting references.

Finally, if this extensive search concludes without identifying mechanistic links with the entity, we would assume that the entity cannot be linked with any mechanism.

It is important to remember that the spectrum of biological scales that are part of NeuroMMSig varies from the chemical space to clinical endpoints. Therefore, it might be arduous or even impossible to link some entity types to mechanisms (e.g., clinical endpoints like brain region volumes). Moreover, the procedure to map entities to mechanisms varies depending on the entity type. For example, in the case of genes we would first investigate databases like UniProt or pathway databases such as Reactome or KEGG. However, investigating the link between a specific lipid and a mechanism might require to look at dedicated databases like CHEBI.

This mapping process is conducted in parallel with the annotation of the BEL document. That means that by annotating every statement, we would try to see whether the entities are linked with any mechanism, and we would proceed with the annotation if links have been found. Furthermore, we would add to a mapping file the references support the link between the entity and the mechanism. Below, some examples of the annotation process.

Annotation examples

Below some examples of annotations of BEL statements

Example 1: This example shows how a simple triplet with its corresponding NeuroMMsig subgraphs was annotated. In this example, we have a triplet containing two entities associated. The first one is a gene (EPHA1) that codes for a protein related to Akt signaling, and the second one is the node representing a condition (Alzheimer's disease). Therefore, we would first check whether the subject or the object are associated with any AD mechanism. For that, we would use text mining resources or search engines (e.g., SCAView, PubMed, Google). After an extensive search, we would conclude that the gene EPHA1 is involved in Akt signaling, a pathway related to AD. Next, we would add the corresponding references that support this link to the mapping file. Finally, we would annotate this BEL statement only with the Akt subgraph since the Alzheimer's disease node is a general entity not associated with any mechanism in particular.

SET Citation = {"PubMed", "XXXX", "XXXXX"}
SET Subgraph = "Akt subgraph"
g(HGNC:EPHA1) association path(MESHD:"Alzheimer Disease")

Example 2: This example shows how a triplet that presents a relationship between a chemical and a biological process was annotated. In this case, we would first check in the literature if the chemical corticosteroid or inflammation processes plays a role in the disease. After the search, we concluded that corticosteroid is not involved with any mechanisms known in AD, and inflammation is a well-known process in AD. Therefore, this BEL statement (triplet) is annotated with the "Inflammatory response subgraph", the network that comprises all the knowledge around inflammation processes in the context of AD.

SET Citation = {"PubMed", "XXXX", "XXXXX"} 
SET Evidence = "high-dose steroid treatment decreases vascular inflammation and ischemic tissue damage after myocardial infarction and stroke through direct vascular effects involving the nontranscriptional activation of eNOS"
SET Species = "9606" #Taxonomy ID- Homo sapiens
SET Tissue = "Vascular System"
SET Disease = "Stroke"
SET Subgraph = "Inflammatory response subgraph"
a(CHEBI:corticosteroid) decreases bp(MESHD:"Inflammation")

Example 3: In some cases, it is required to not only investigate the link of the entities in the triplet with a disease-specific mechanism but also the relationships that is part of the triplet. For instance, when amyloid beta protein is not correctly processed and leads to amyloid plaque formation we talk about the "amyloidogenic pathway/process". However, when the amyloid beta protein is processed correctly we talk about the "non-amyloidogenic pathway". Therefore, we annotated the AD Knowledge Assembly using two annotations representing each different pathway depending on the relationship involved in the triplet.

SET Citation = {"PubMed", "XXXX", "XXXXX"}
SET Evidence = "Protein X increases Amyloid Beta 42 fragment"
SET Subgraph = "Amyloidogenic subgraph"
p(HGNC:X) increases p(HGNC:APP, frag(672_713))
SET Citation = {"PubMed", "XXXX", "XXXXX"}
SET Evidence = "Protein X inhibits APP"
SET Subgraph = "Non-amyloidogenic subgraph"
p(HGNC:X) decreases p(HGNC:APP)

Example 4: The last example shows how triplets might be involved in multiple pathways. Therefore, a triplet (subject-relationship-object) can be annotated to multiple subgraphs as it is shown in the following example where the triplet is part of two different subgraphs (one linking EPHA1 to "Akt subgraph" and the other linking inflammation to its corresponding subgraph).

SET Citation = {"PubMed", "XXXX", "XXXXX"}
SET Evidence = "AKT1 is positively correlated with inflammation processes
SET Subgraph = {"Akt subgraph", "Inflammatory response subgraph"}
p(HGNC:EPHA1) positiveCorrelation bp(MESHD:"Inflammation")

Enrichment Ranking Algorithm

The enrichment ranking algorithm allows user to prioritize subgraphs given the enrichment score. Therefore, the user submits their data, the algorithm calculates a score for the data-mapped subgraphs as a way to prioritize further exploration. Following, more details about the algorithm.

The enrichment algorithm evaluates a score given three different scores and their corresponding weights (equation 1). The weights allow the users to bias the algorithm towards some of the measurements more than others. They can be modified from zero to one, but by default they are set to one so each of the three scores have the same weight in the enrichment score.

$$s=w_{1}s_{1}+w_{2}s_{2}+w_{3}s_{3}$$
Equation 1. The enrichment score (s) is based on the sum the three scores that represent different measurements each of those focuses on different aspects of the network.

Following, we provide a detailed description of each of the scores.

Mechanisms in subgraphs:


What do we call a mechanism?

“A chain of causes and effects forms a pathophysiological context, where minor dysregulation of molecular events may aggregate at a network level and lead to a pathological deviation from the normal state (Hofmann-Apitius et al., 2015)".

Once data is mapped to the subgraphs, we can identify the different ways the data-mapped nodes dysrupt a particular node of interest such a biological process. For more detail about how NeuroMMSig might identify possible dysregulated paths in the networks, please visit "How to use NeuroMMSig" section.

References:

Gu, Z. et al. (2012) Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes. BMC systems biology 6.1: 56.

Joy, M. P. et al. (2005) High-betweenness proteins in the yeast protein interaction network. BioMed Research International. 2: 96-103

Kanehisa, Minoru, and Susumu Goto. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28.1 : 27-30.

Khatri, P. et al. (2012) Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8.2 e1002375

Kodamullil, A. et al. (2015) Computable cause-and-effect models of healthy and Alzheimer's disease states and their mechanistic differential analysis. Alzheimer's & Dementia 11.11 : 1329-1339.

Martin Hofmann-Apitius et al. (2015) Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders. eng. In: Int J Mol Sci 16.12, pp. 29179–29206. doi: 10.3390/ijms161226148. url: http://dx.doi.org/10.3390/ijms161226148

List of all subgraph names for each disease available in NeuroMMSig

Click in the disease to show its available subgraphs. You can click to go to the selected subgraph. Unfortunately, since no data is submitted one can only inspect but not calculate candidate mechanisms.