Nodes represent web pages and directed edges represent hyperlinks between them. The data was released in 2002 by Google as a part of Google Programming Contest.
| Dataset statistics | |
|---|---|
| Nodes | 875713 |
| Edges | 5105039 |
| Nodes in largest WCC | 855802 (0.977) |
| Edges in largest WCC | 5066842 (0.993) |
| Nodes in largest SCC | 434818 (0.497) |
| Edges in largest SCC | 3419124 (0.670) |
| Average clustering coefficient | 0.5143 |
| Number of triangles | 13391903 |
| Fraction of closed triangles | 0.01911 |
| Diameter (longest shortest path) | 21 |
| 90-percentile effective diameter | 8.1 |
| File | Description |
|---|---|
| web-Google.txt.gz | Webgraph from the Google programming contest, 2002 |