|
| 1 | +# Acknowledgments |
| 2 | + |
| 3 | +The Internet Yellow Pages could not exist without all the awesome prior research and |
| 4 | +data sources. We list all of them here, if possible with their corresponding licenses, |
| 5 | +to which you will need to conform if you use the public instance or create a dump that |
| 6 | +includes these data sources. |
| 7 | + |
| 8 | +Please refer to the READMEs in the respective crawler directories for more information. |
| 9 | + |
| 10 | +## Alice-LG |
| 11 | + |
| 12 | +We retrieve route server looking glass snapshots from the following IXPs. |
| 13 | + |
| 14 | +| Name | URL | |
| 15 | +|----------|----------------------------| |
| 16 | +| AMS-IX | https://lg.ams-ix.net/ | |
| 17 | +| BCIX | https://lg.bcix.de/ | |
| 18 | +| DE-CIX | https://lg.de-cix.net/ | |
| 19 | +| IX.br | https://lg.ix.br/ | |
| 20 | +| LINX | https://alice-rs.linx.net/ | |
| 21 | +| Megaport | https://lg.megaport.com/ | |
| 22 | +| Netnod | https://lg.netnod.se/ | |
| 23 | + |
| 24 | +## APNIC |
| 25 | + |
| 26 | +We use [APNIC](https://labs.apnic.net/)'s [AS population |
| 27 | +estimate](https://labs.apnic.net/index.php/2014/10/02/how-big-is-that-network/). |
| 28 | + |
| 29 | +## BGPKIT |
| 30 | + |
| 31 | +We use the as2rel, peer-stats, and pfx2as [datasets](https://data.bgpkit.com/) from |
| 32 | +[BGPKIT](https://bgpkit.com/). |
| 33 | + |
| 34 | +Use of this data is authorized under their [Acceptable Use |
| 35 | +Agreement](https://bgpkit.com/aua). |
| 36 | + |
| 37 | +## BGP.Tools |
| 38 | + |
| 39 | +We use [AS names, AS tags](https://bgp.tools/kb/api), and [anycast prefix |
| 40 | +tags](https://github.com/bgptools/anycast-prefixes) provided by |
| 41 | +[BGP.Tools](https://bgp.tools/). |
| 42 | + |
| 43 | +## CAIDA |
| 44 | + |
| 45 | +We use two datasets from [CAIDA](https://www.caida.org/) which use is authorized |
| 46 | +under their [Acceptable Use Agreement](https://www.caida.org/about/legal/aua/). |
| 47 | + |
| 48 | +> CAIDA AS Rank https://doi.org/10.21986/CAIDA.DATA.AS-RANK. |
| 49 | +
|
| 50 | +and |
| 51 | + |
| 52 | +> The CAIDA UCSD IXPs Dataset, |
| 53 | +> https://www.caida.org/catalog/datasets/ixps |
| 54 | +
|
| 55 | +## Cisco |
| 56 | + |
| 57 | +We use the [Cisco Umbrella Popularity |
| 58 | +List](https://s3-us-west-1.amazonaws.com/umbrella-static/index.html). |
| 59 | + |
| 60 | +## Citizen Lab |
| 61 | + |
| 62 | +We use URL testing lists from [The Citizen Lab](https://citizenlab.ca/). |
| 63 | + |
| 64 | +> Citizen Lab and Others. 2014. URL Testing Lists Intended for Discovering Website |
| 65 | +> Censorship. https://github.com/citizenlab/test-lists. |
| 66 | +
|
| 67 | +This data is licensed under [CC BY-NC-SA |
| 68 | +4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the data. |
| 69 | + |
| 70 | +## Cloudflare |
| 71 | + |
| 72 | +We use the `radar/dns/top/ases`, `radar/dns/top/locations`, `radar/ranking/top`, and |
| 73 | +`radar/datasets` endpoints of the [Clouflare Radar](https://radar.cloudflare.com/) API. |
| 74 | + |
| 75 | +This data is licensed under [CC BY-NC |
| 76 | +4.0](https://creativecommons.org/licenses/by-nc/4.0/). No changes were made to the data. |
| 77 | + |
| 78 | +## Emile Aben |
| 79 | + |
| 80 | +We use [AS names](https://github.com/emileaben/asnames) provided by Emile Aben and |
| 81 | +others with permission (Hi Emile!). |
| 82 | + |
| 83 | +## Internet Health Report |
| 84 | + |
| 85 | +We use three datasets from the [Internet Health Report](https://ihr.iijlab.net/) (that's |
| 86 | +us!): Country Dependency, AS Hegemony, and Route Origin Validation. |
| 87 | + |
| 88 | +This data is licensed under [CC BY-NC-SA |
| 89 | +4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes were made to the |
| 90 | +data. |
| 91 | + |
| 92 | +## Internet Intelligence Lab |
| 93 | + |
| 94 | +We use the AS to organization mapping from the [Internet Intelligence Lab at Georgia |
| 95 | +Tech](https://inetintel.notion.site/Internet-Intelligence-Research-Lab-d186184563d345bab51901129d812ed6). |
| 96 | + |
| 97 | +> Z. Chen, Z. Bischof, C. Testart, A. Dainotti, "AS to Organization Mapping", |
| 98 | +> Internet Intelligence Lab at Georgia Tech, |
| 99 | +> https://github.com/InetIntel/Dataset-AS-to-Organization-Mapping |
| 100 | +
|
| 101 | +Use of this data is authorized under their [Acceptable Use |
| 102 | +Agreement](https://raw.githubusercontent.com/InetIntel/Dataset-AS-to-Organization-Mapping/master/LICENSE). |
| 103 | + |
| 104 | +## Number Resource Organization |
| 105 | + |
| 106 | +We use the [extended allocation and assignment |
| 107 | +reports](https://www.nro.net/about/rirs/statistics/) provided by the [Number Resource |
| 108 | +Organization](https://www.nro.net/). |
| 109 | + |
| 110 | +## OpenINTEL |
| 111 | + |
| 112 | +We use several datasets from [OpenINTEL](https://www.openintel.nl/), a joint project of |
| 113 | +the University of Twente, SURF, SIDN Labs and NLnet Labs. |
| 114 | + |
| 115 | +The `tranco1m` and `umbrella1m` [datasets](https://data.openintel.nl/data/) are licensed |
| 116 | +under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). No changes |
| 117 | +were made to the data. In addition, there are [Terms of |
| 118 | +Use](https://data.openintel.nl/data/README.txt) for this data. |
| 119 | + |
| 120 | +The [DNS Dependency Graph tool](https://dnsgraph.dacs.utwente.nl/) is a joint project of |
| 121 | +the University of Twente and IIJ Research Laboratory. |
| 122 | + |
| 123 | +Other datasets are used with permission from OpenINTEL. |
| 124 | + |
| 125 | +## Packet Clearing House |
| 126 | + |
| 127 | +We use the [daily routing snapshots](https://www.pch.net/resources/Routing_Data/) from |
| 128 | +[Packet Clearing House](https://www.pch.net/). |
| 129 | + |
| 130 | +This data is licensed under [CC BY-NC-SA |
| 131 | +3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/). No changes were made to the |
| 132 | +data. |
| 133 | + |
| 134 | +## PeeringDB |
| 135 | + |
| 136 | +We use the `fac`, `ix`, `ixlan`, `netfac`, and `org` endpoints of the |
| 137 | +[PeeringDB](https://www.peeringdb.com/) API. |
| 138 | + |
| 139 | +Use of this data is authorized under their [Acceptable Use |
| 140 | +Policy](https://www.peeringdb.com/aup). |
| 141 | + |
| 142 | +## RIPE NCC |
| 143 | + |
| 144 | +We use AS names, Atlas measurement information, and RPKI data from the [RIPE |
| 145 | +NCC](https://www.ripe.net/) and [RIPE Atlas](https://atlas.ripe.net/). |
| 146 | + |
| 147 | +## Stanford |
| 148 | + |
| 149 | +We use the [Stanford ASdb dataset](https://asdb.stanford.edu/) provided by the [Stanford |
| 150 | +Empirical Security Research Group](https://esrg.stanford.edu/). |
| 151 | + |
| 152 | +> [ASdb: A System for Classifying Owners of Autonomous |
| 153 | +> Systems](https://zakird.com/papers/asdb.pdf). |
| 154 | +> Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric. |
| 155 | +> ACM Internet Measurement Conference (IMC), November 2021. |
| 156 | +
|
| 157 | +## Tranco |
| 158 | + |
| 159 | +We use the [Tranco list](https://tranco-list.eu/) provided by the [DistriNet Research |
| 160 | +Unit KU Leuven](https://distrinet.cs.kuleuven.be/), [TU Delft](https://www.tudelft.nl/), |
| 161 | +and [LIG](https://www.liglab.fr/). |
| 162 | + |
| 163 | +The Tranco list combines lists from five providers: |
| 164 | + |
| 165 | +1. [Cisco |
| 166 | +Umbrella](https://umbrella-static.s3-us-west-1.amazonaws.com/index.html) |
| 167 | +1. [Majestic](https://majestic.com/reports/majestic-million) (available under a [CC BY |
| 168 | + 3.0](https://creativecommons.org/licenses/by/3.0/) license) |
| 169 | +1. [Farsight](https://www.domaintools.com/resources/blog/mirror-mirror-on-the-wall-whos-the-fairest-website-of-them-all) |
| 170 | +1. [Chrome User Experience Report (CrUX)](https://developer.chrome.com/docs/crux/) |
| 171 | + ([available](https://research.google/resources/datasets/chrome-user-experience-report/) |
| 172 | + under a [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license) |
| 173 | +1. [Cloudflare Radar](https://radar.cloudflare.com/domains) |
| 174 | + ([available](https://radar.cloudflare.com/about) under a [CC BY-NC |
| 175 | + 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license). |
| 176 | + |
| 177 | +## Virginia Tech |
| 178 | + |
| 179 | +We use the [RoVista](https://rovista.netsecurelab.org/) dataset provided by the |
| 180 | +NetSecLab group at Virginia Tech. |
| 181 | + |
| 182 | +> RoVista: Measuring and Understanding the Route Origin Validation (ROV) in RPKI. |
| 183 | +> Weitong Li, Zhexiao Lin, Md. Ishtiaq Ashiq, Emile Aben, Romain Fontugne, |
| 184 | +> Amreesh Phokeer, and Taejoong Chung. |
| 185 | +> ACM Internet Measurement Conference (IMC), October 2023. |
0 commit comments