Stanford Biomedical Network Dataset Collection

Networks and relationships : Datasets with information about relationships between entities
Entities and feature tables : Datasets with information about entities

Mambo is a tool for construction, representation, and analysis of large and multimodal biomedical network data.

Networks and relationships

Name	Edges	Entities	Description
CC-Neuron	49,471,006	cell, cell	Similarity network between cells in embroyonic mouse brain
ChCh-Miner	96,137	drug, drug	Interactions between FDA-approved drugs
ChChSe-Decagon	4,649,441	drug, drug, side-effect	Side effects of drug combinations
ChG-InterDecagon	131,034	drug, gene	Chemical-gene interaction network
ChG-Miner	15,424	drug, gene	Drug-target interaction network
ChG-TargetDecagon	18,690	drug, gene	Drug-target interaction network
ChSe-Decagon	174,977	drug, side-effect	Drug side-effect association network
DCh-Miner	1,334,088	disease, drug	Disease-drug association network
DD-Miner	6,877	disease, disease	Hierarchical ontology of diseases
DF-Miner	802,760	disease, function	Disease-function association network
DG-AssocMiner	21,357	disease, gene	Disease-gene association network
DG-Miner	42,475,361	disease, gene	Disease-gene association network
FF-Miner	119,464	function, function	Relations between biological processes, molecular functions, and cellular components
GF-Miner	16,628	gene, function	Gene-function association network
GG-EnhancedTissue	3,642,834,333	gene, gene	Enhanced tissue-specific gene-gene interaction networks
GP-Miner	102,450	gene, protein	Protein-coding gene associations
GrGr-EnhancedHiC1K	7,224,824	genomic-region, genomic-region	Enhanced Hi-C interaction network
GrGr-EnhancedHiC5K	682,566	genomic-region, genomic-region	Enhanced Hi-C interaction network
PP-Decagon	715,612	protein, protein	Physical and functional protein-protein association network for human
PP-Miner	1,847,117,370	protein, protein	Protein-protein association networks for many different species
PP-Pathways	342,353	protein, protein	Physical protein-protein interaction network for human
PPT-Ohmnet	70,338	protein, protein, tissue	Tissue-specific protein-protein interaction network
SS-Butterfly	832	species, species	Similarity network between butterflies
TFG-Ohmnet	20,619	tissue, function, gene	Tissue-specific protein-function association networks

Entities and feature tables

Name	Size	Entity	Description
D-DoMiner	9,247	disease	Disease synopses
D-DoPathways	301	disease	Mapping of diseases to disease categories
D-MeshMiner	11,332	disease	Disease synopses
D-MtfPathways	519	disease	Network motifs of disease pathways
D-OmimMiner	1,191	disease	Disease synopses
D-StructPathways	520	disease	Network structural features of disease pathways
G-HumanEssential	18,529	gene	Information on experimentally tested essential and non-essential genes
G-MtfPathways	22,552	gene	Network motifs of genes
G-SynMiner	35,654	gene	Gene synopses
Se-DoDecagon	562	side-effect	Mapping of side effects to side-effect categories

Entity types

Cell [C] : basic structural, biological, and functional unit of all organisms measured by single-cell technologies
Disease [D] : medical condition that is associated with specific symptoms and signs
Drug/Chemical [Ch] : chemical substance of known structure that produces a biological effect
Function [F] : gene role classified into molecular functions, cellular components, and biological processes
Gene [G] : sequence of DNA or RNA that codes for a molecule that has a function
Genomic region [Gr] : segment of a nucleic acid molecule, e.g., a regulatory sequence
Protein [P] : molecule that performs a vast array of functions within organisms
Side-effect [Se] : secondary, typically undesirable effect of a drug or medical treatment
Species/Organism [S] : basic unit of classification and a taxonomic rank, as well as a unit of biodiversity
Tissue [T] : cellular organizational level between cells and a complete organ

Network statistics

Dataset statistics
Nodes	Number of nodes in the network
Edges	Number of edges in the network
Nodes in largest WCC	Number of nodes in the largest weakly connected component
Edges in largest WCC	Number of edges in the largest weakly connected component
Nodes in largest SCC	Number of nodes in the largest strongly connected component
Edges in largest SCC	Number of edges in the largest strongly connected component
Average clustering coefficient	Average clustering coefficient
Number of triangles	Number of triples of connected nodes (considering the network as undirected)
Fraction of closed triangles	Number of connected triples of nodes / number of (undirected) length 2 paths
Diameter (longest shortest path)	Maximum undirected shortest path length (sampled over 1,000 random nodes)
90-percentile effective diameter	90-th percentile of undirected shortest path length distribution (sampled over 1,000 random nodes)

Citing BioSNAP

We encourage you to cite our datasets if you have used them in your work. You can use the following BibTeX citation:

@misc{biosnapnets,
  author       = {Marinka Zitnik, Rok Sosi\v{c}, Sagar Maheshwari, and Jure Leskovec},
  title        = {{BioSNAP Datasets}: {Stanford} Biomedical Network Dataset Collection},
  howpublished = {\url{http://snap.stanford.edu/biodata}},
  month        = aug,
  year         = 2018
}

The following people also contributed to BioSNAP: Monica Agrawal, Agrim Gupta, Nina Mrzelj, Priyanka Nigam, Sheila Ramaswamy, and Viswajith Venugopal.