Introducing Peer-Stats Dataset
Public release of the peer-stats dataset for historical and ongoing public BGP collectors' peer information.
Public BGP data collector projects like RouteViews and RIPE RIS provide valuable research and operational information for understanding BGP and detecting Internet routing anomalies.
There are many BGP routers involved in BGP collection projects.
A project includes many "collectors," and each serves as a collection of messages from several active BGP peers from different networks. Some bigger collectors collect BGP data from more than one hundred BGP routers. For example, RIPE RIS rrc00
has 112 active BGP peers at writing. (To learn about the complete list of BGP peers from all collectors, try the experimental tools we developed.)
Sometimes, too many BGP peers may become problematic.
Not all peers present the same amount of data. Some peers are so-called "full-feed" peers, which are the ones that provide the full routing tables to the collector. In a routing table dump file from the collectors, we can observe the full table of these peers. Some peers, however, only provide a limited number of routing entries to the collectors, not representing the whole routing status from these peers. In a project that tries to rebuild full routing tables, e.g., some BGP hijack detection or anomaly detectors, people prefer to use the full-feed peers as their data source.
At times, we are only interested in data from certain peers. For example, when studying the routing data from a particular network, if the network connects to BGP data collectors, we can directly pull data from the collectors' data. However, it can be troublesome to learn about what collectors have data from certain peers. RIPE RIS provides a nice API for querying such info, but we couldn't find one for RouteViews.
Historical data for such information is also missing. Unfortunately, for the researchers who want to study the evolution of the data collectors, even RIPE RIS's peers API could not help with that.
Introducing BGPKIT Peer-Stats Dataset
Peer-Stats
dataset is a publicly available, free-to-use dataset that aims to provide daily collector peer information for all RouteViews and RIPE RIS collectors for ten years.
The data includes the following fields for each peer of a BGP collector:
asn
: Autonomous System Number of the collector peerip
: the IP address of the collector peernum_v4_pfxs
: the number of IPv4 prefixes propagated from the collector peernum_v6_pfx
s: the number of IPv6 prefixes propagated from the collector peernum_connected_asns
: the number of connected (immediate next hop) ASes from the collector peer
The dataset is organized by the following structure.
- collector
- year
- month
- data files
Each data file is in JSON format (see the section below) and compressed with bzip2. Users can easily use tools like bzcat
and jq
to view the data files. For example, you can run the following command to quickly view any of the peer-stats data for the collector rrc00
on 2022-05-01.
curl "https://data.bgpkit.com/peer-stats/rrc00/2022/05/rrc00-2022-05-01-1651363200.bz2" --silent | bzcat | jq
{
"collector": "rrc00",
"peers": {
"102.67.56.1": {
"asn": 328474,
"ip": "102.67.56.1",
"num_connected_asns": 330,
"num_v4_pfxs": 919443,
"num_v6_pfxs": 0
},
"103.102.5.1": {
"asn": 131477,
"ip": "103.102.5.1",
"num_connected_asns": 184,
"num_v4_pfxs": 895482,
"num_v6_pfxs": 0
},
...
Because all the data files are generated against the midnight UTC RIB dump of the day, you can also easily construct a URL to a data file for any particular date using the following template.
https://data.bgpkit.com/peer-stats/{COLLECTOR}/{YEAR}/{MONTH}/{COLLECTOR}-{YEAR}-{MONTH}-{DAY}-{MIDNIGHT_TIMESTAMP}.bz2
Open-source
We also open-sourced the data collection command-line tool source code on GitHub. Feel free to check it out and run it on your infrastructure if needed.
Credits and Sponsorship
The original idea for this work came from our extensive discussion with Romain Fontugne (follow him on Twitter at @romain_fontugne) from IIJ. This work is made possible by IIJ's generous sponsorship.
Please consider sponsoring us on GitHub if you find our work valuable and would like to see more open-source code and datasets on BGP.