Mambo is a tool for construction, representation, and analysis of large and multimodal biomedical network data.
| Name | Edges | Entities | Description |
|---|---|---|---|
| CC-Neuron | 49,471,006 | cell, cell | Similarity network between cells in embroyonic mouse brain |
| ChCh-Miner | 96,137 | drug, drug | Interactions between FDA-approved drugs |
| ChChSe-Decagon | 4,649,441 | drug, drug, side-effect | Side effects of drug combinations |
| ChG-InterDecagon | 131,034 | drug, gene | Chemical-gene interaction network |
| ChG-Miner | 15,424 | drug, gene | Drug-target interaction network |
| ChG-TargetDecagon | 18,690 | drug, gene | Drug-target interaction network |
| ChSe-Decagon | 174,977 | drug, side-effect | Drug side-effect association network |
| DCh-Miner | 1,334,088 | disease, drug | Disease-drug association network |
| DD-Miner | 6,877 | disease, disease | Hierarchical ontology of diseases |
| DF-Miner | 802,760 | disease, function | Disease-function association network |
| DG-AssocMiner | 21,357 | disease, gene | Disease-gene association network |
| DG-Miner | 42,475,361 | disease, gene | Disease-gene association network |
| FF-Miner | 119,464 | function, function | Relations between biological processes, molecular functions, and cellular components |
| GF-Miner | 16,628 | gene, function | Gene-function association network |
| GG-EnhancedTissue | 3,642,834,333 | gene, gene | Enhanced tissue-specific gene-gene interaction networks |
| GP-Miner | 102,450 | gene, protein | Protein-coding gene associations |
| GrGr-EnhancedHiC1K | 7,224,824 | genomic-region, genomic-region | Enhanced Hi-C interaction network |
| GrGr-EnhancedHiC5K | 682,566 | genomic-region, genomic-region | Enhanced Hi-C interaction network |
| PP-Decagon | 715,612 | protein, protein | Physical and functional protein-protein association network for human |
| PP-Miner | 1,847,117,370 | protein, protein | Protein-protein association networks for many different species |
| PP-Pathways | 342,353 | protein, protein | Physical protein-protein interaction network for human |
| PPT-Ohmnet | 70,338 | protein, protein, tissue | Tissue-specific protein-protein interaction network |
| SS-Butterfly | 832 | species, species | Similarity network between butterflies |
| TFG-Ohmnet | 20,619 | tissue, function, gene | Tissue-specific protein-function association networks |
| Name | Size | Entity | Description |
|---|---|---|---|
| D-DoMiner | 9,247 | disease | Disease synopses |
| D-DoPathways | 301 | disease | Mapping of diseases to disease categories |
| D-MeshMiner | 11,332 | disease | Disease synopses |
| D-MtfPathways | 519 | disease | Network motifs of disease pathways |
| D-OmimMiner | 1,191 | disease | Disease synopses |
| D-StructPathways | 520 | disease | Network structural features of disease pathways |
| G-HumanEssential | 18,529 | gene | Information on experimentally tested essential and non-essential genes |
| G-MtfPathways | 22,552 | gene | Network motifs of genes |
| G-SynMiner | 35,654 | gene | Gene synopses |
| Se-DoDecagon | 562 | side-effect | Mapping of side effects to side-effect categories |
| Dataset statistics | |
|---|---|
| Nodes | Number of nodes in the network |
| Edges | Number of edges in the network |
| Nodes in largest WCC | Number of nodes in the largest weakly connected component |
| Edges in largest WCC | Number of edges in the largest weakly connected component |
| Nodes in largest SCC | Number of nodes in the largest strongly connected component |
| Edges in largest SCC | Number of edges in the largest strongly connected component |
| Average clustering coefficient | Average clustering coefficient |
| Number of triangles | Number of triples of connected nodes (considering the network as undirected) |
| Fraction of closed triangles | Number of connected triples of nodes / number of (undirected) length 2 paths |
| Diameter (longest shortest path) | Maximum undirected shortest path length (sampled over 1,000 random nodes) |
| 90-percentile effective diameter | 90-th percentile of undirected shortest path length distribution (sampled over 1,000 random nodes) |
We encourage you to cite our datasets if you have used them in your work. You can use the following BibTeX citation:
@misc{biosnapnets,
author = {Marinka Zitnik, Rok Sosi\v{c}, Sagar Maheshwari, and Jure Leskovec},
title = {{BioSNAP Datasets}: {Stanford} Biomedical Network Dataset Collection},
howpublished = {\url{http://snap.stanford.edu/biodata}},
month = aug,
year = 2018
}
The following people also contributed to BioSNAP: Monica Agrawal, Agrim Gupta, Nina Mrzelj, Priyanka Nigam, Sheila Ramaswamy, and Viswajith Venugopal.