Skip to content

Commit de1f514

Browse files
Merge pull request #127 from m-appel/acknowledgment-file
Add acknowledgments
2 parents 236f736 + e4cb984 commit de1f514

File tree

1 file changed

+185
-0
lines changed

1 file changed

+185
-0
lines changed

ACKNOWLEDGMENTS.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# Acknowledgments
2+
3+
The Internet Yellow Pages could not exist without all the awesome prior research and
4+
data sources. We list all of them here, if possible with their corresponding licenses,
5+
to which you will need to conform if you use the public instance or create a dump that
6+
includes these data sources.
7+
8+
Please refer to the READMEs in the respective crawler directories for more information.
9+
10+
## Alice-LG
11+
12+
We retrieve route server looking glass snapshots from the following IXPs.
13+
14+
| Name | URL |
15+
|----------|----------------------------|
16+
| AMS-IX | https://lg.ams-ix.net/ |
17+
| BCIX | https://lg.bcix.de/ |
18+
| DE-CIX | https://lg.de-cix.net/ |
19+
| IX.br | https://lg.ix.br/ |
20+
| LINX | https://alice-rs.linx.net/ |
21+
| Megaport | https://lg.megaport.com/ |
22+
| Netnod | https://lg.netnod.se/ |
23+
24+
## APNIC
25+
26+
We use [APNIC](https://labs.apnic.net/)'s [AS population
27+
estimate](https://labs.apnic.net/index.php/2014/10/02/how-big-is-that-network/).
28+
29+
## BGPKIT
30+
31+
We use the as2rel, peer-stats, and pfx2as [datasets](https://data.bgpkit.com/) from
32+
[BGPKIT](https://bgpkit.com/).
33+
34+
Use of this data is authorized under their [Acceptable Use
35+
Agreement](https://bgpkit.com/aua).
36+
37+
## BGP.Tools
38+
39+
We use [AS names, AS tags](https://bgp.tools/kb/api), and [anycast prefix
40+
tags](https://github.com/bgptools/anycast-prefixes) provided by
41+
[BGP.Tools](https://bgp.tools/).
42+
43+
## CAIDA
44+
45+
We use two datasets from [CAIDA](https://www.caida.org/) which use is authorized
46+
under their [Acceptable Use Agreement](https://www.caida.org/about/legal/aua/).
47+
48+
> CAIDA AS Rank https://doi.org/10.21986/CAIDA.DATA.AS-RANK.
49+
50+
and
51+
52+
> The CAIDA UCSD IXPs Dataset,
53+
> https://www.caida.org/catalog/datasets/ixps
54+
55+
## Cisco
56+
57+
We use the [Cisco Umbrella Popularity
58+
List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html).
59+
60+
## Citizen Lab
61+
62+
We use URL testing lists from [The Citizen Lab](https://citizenlab.ca/).
63+
64+
> Citizen Lab and Others. 2014. URL Testing Lists Intended for Discovering Website
65+
> Censorship. https://github.com/citizenlab/test-lists.
66+
67+
This data is licensed under [CC BY-NC-SA
68+
4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the data.
69+
70+
## Cloudflare
71+
72+
We use the `radar/dns/top/ases`, `radar/dns/top/locations`, `radar/ranking/top`, and
73+
`radar/datasets` endpoints of the [Clouflare Radar](https://radar.cloudflare.com/) API.
74+
75+
This data is licensed under [CC BY-NC
76+
4.0](https://creativecommons.org/licenses/by-nc/4.0/). No changes were made to the data.
77+
78+
## Emile Aben
79+
80+
We use [AS names](https://github.com/emileaben/asnames) provided by Emile Aben and
81+
others with permission (Hi Emile!).
82+
83+
## Internet Health Report
84+
85+
We use three datasets from the [Internet Health Report](https://ihr.iijlab.net/) (that's
86+
us!): Country Dependency, AS Hegemony, and Route Origin Validation.
87+
88+
This data is licensed under [CC BY-NC-SA
89+
4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the
90+
data.
91+
92+
## Internet Intelligence Lab
93+
94+
We use the AS to organization mapping from the [Internet Intelligence Lab at Georgia
95+
Tech](https://inetintel.notion.site/Internet-Intelligence-Research-Lab-d186184563d345bab51901129d812ed6).
96+
97+
> Z. Chen, Z. Bischof, C. Testart, A. Dainotti, "AS to Organization Mapping",
98+
> Internet Intelligence Lab at Georgia Tech,
99+
> https://github.com/InetIntel/Dataset-AS-to-Organization-Mapping
100+
101+
Use of this data is authorized under their [Acceptable Use
102+
Agreement](https://raw.githubusercontent.com/InetIntel/Dataset-AS-to-Organization-Mapping/master/LICENSE).
103+
104+
## Number Resource Organization
105+
106+
We use the [extended allocation and assignment
107+
reports](https://www.nro.net/about/rirs/statistics/) provided by the [Number Resource
108+
Organization](https://www.nro.net/).
109+
110+
## OpenINTEL
111+
112+
We use several datasets from [OpenINTEL](https://www.openintel.nl/), a joint project of
113+
the University of Twente, SURF, SIDN Labs and NLnet Labs.
114+
115+
The `tranco1m` and `umbrella1m` [datasets](https://data.openintel.nl/data/) are licensed
116+
under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes
117+
were made to the data. In addition, there are [Terms of
118+
Use](https://data.openintel.nl/data/README.txt) for this data.
119+
120+
The [DNS Dependency Graph tool](https://dnsgraph.dacs.utwente.nl/) is a joint project of
121+
the University of Twente and IIJ Research Laboratory.
122+
123+
Other datasets are used with permission from OpenINTEL.
124+
125+
## Packet Clearing House
126+
127+
We use the [daily routing snapshots](https://www.pch.net/resources/Routing_Data/) from
128+
[Packet Clearing House](https://www.pch.net/).
129+
130+
This data is licensed under [CC BY-NC-SA
131+
3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/). No changes were made to the
132+
data.
133+
134+
## PeeringDB
135+
136+
We use the `fac`, `ix`, `ixlan`, `netfac`, and `org` endpoints of the
137+
[PeeringDB](https://www.peeringdb.com/) API.
138+
139+
Use of this data is authorized under their [Acceptable Use
140+
Policy](https://www.peeringdb.com/aup).
141+
142+
## RIPE NCC
143+
144+
We use AS names, Atlas measurement information, and RPKI data from the [RIPE
145+
NCC](https://www.ripe.net/) and [RIPE Atlas](https://atlas.ripe.net/).
146+
147+
## Stanford
148+
149+
We use the [Stanford ASdb dataset](https://asdb.stanford.edu/) provided by the [Stanford
150+
Empirical Security Research Group](https://esrg.stanford.edu/).
151+
152+
> [ASdb: A System for Classifying Owners of Autonomous
153+
> Systems](https://zakird.com/papers/asdb.pdf).
154+
> Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric.
155+
> ACM Internet Measurement Conference (IMC), November 2021.
156+
157+
## Tranco
158+
159+
We use the [Tranco list](https://tranco-list.eu/) provided by the [DistriNet Research
160+
Unit KU Leuven](https://distrinet.cs.kuleuven.be/), [TU Delft](https://www.tudelft.nl/),
161+
and [LIG](https://www.liglab.fr/).
162+
163+
The Tranco list combines lists from five providers:
164+
165+
1. [Cisco
166+
Umbrella](https://umbrella-static.s3-us-west-1.amazonaws.com/index.html)
167+
1. [Majestic](https://majestic.com/reports/majestic-million) (available under a [CC BY
168+
3.0](https://creativecommons.org/licenses/by/3.0/) license)
169+
1. [Farsight](https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all)
170+
1. [Chrome User Experience Report (CrUX)](https://developer.chrome.com/docs/crux/)
171+
([available](https://research.google/resources/datasets/chrome-user-experience-report/)
172+
under a [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license)
173+
1. [Cloudflare Radar](https://radar.cloudflare.com/domains)
174+
([available](https://radar.cloudflare.com/about) under a [CC BY-NC
175+
4.0](https://creativecommons.org/licenses/by-nc/4.0/) license).
176+
177+
## Virginia Tech
178+
179+
We use the [RoVista](https://rovista.netsecurelab.org/) dataset provided by the
180+
NetSecLab group at Virginia Tech.
181+
182+
> RoVista: Measuring and Understanding the Route Origin Validation (ROV) in RPKI.
183+
> Weitong Li, Zhexiao Lin, Md. Ishtiaq Ashiq, Emile Aben, Romain Fontugne,
184+
> Amreesh Phokeer, and Taejoong Chung.
185+
> ACM Internet Measurement Conference (IMC), October 2023.

0 commit comments

Comments
 (0)