24 Oct 22:47

m-appel

00c85d5

v4.0.2 Latest

Latest

This release fixes two crawlers and adds a rerun functionality to postprocess scripts.

What's Changed

RIPE Atlas introduced new fields to the measurement metadata, which broke the crawler.
Cloudflare crawlers sometimes run into rate limiting, which was not handled correctly because they do not include a Retry-After header even though they claim to do so.
Add rerun functionality to postprocess scripts.

Full Changelog: v4.0.1...v4.0.2

Assets 2

19 Sep 05:02

m-appel

v4.0.1

2ab61b2

v4.0.1

What's Changed

InetIntel changed their format, so this release fixes that crawler.

Full Changelog: v4.0.0...v4.0.1

Assets 2

10 Sep 11:59

m-appel

v4.0.0

9f13ac9

v4.0.0

(Another) Prefix Node Rework

Remember the changes to the prefix nodes introduced in v3.0.0? Although adding multiple labels to one node is clean, but querying was just getting too complicated. We also found some cases where normal querying just did not work. This is why we are remodeling the prefix nodes (again)!

Instead of adding multiple labels to a single node, each node type now has its own node (but all still have the Prefix label). While this increases the number of nodes and relationships in the graph, it makes querying considerably simpler:

Each IP node has one PART_OF relationship to the most-specific covering prefix of each type.
Each Prefix node type has one PART_OF relationship to the most-specific covering prefix of each type.

For the inter-prefix PART_OF relationships there is one catch: A PART_OF relationship between two different types can be between two nodes with the same prefix property (e.g., a BGPPrefix that is also a RPKIPrefix). For relationships between the same type, the PART_OF relationship indicates the most-specific covering prefix.

Warning: The prefix property is now only unique within each subtype. The general Prefix type is still there for convenience, but querying it will just return some prefix, even when prefix filter is specified. The Prefix type should thus not be used in most queries.

Example

MATCH p = (:IP {ip: '102.218.130.10'})-[:PART_OF]->+(:Prefix)
RETURN p

Although hard to see due to the cutoff label, here we see an IP node (blue), that is connected to its most-specific RIRPrefix (green), BGPPrefix (pink), RPKIPrefix (yellow), and RDNSPrefix (orange). The symmetric relationships between these prefixes indicates that they are actually all the same. The BGP and RDNS prefixes then have larger covering prefixes.

MATCH p = ((:RPKIPrefix)-[:PART_OF]->(:RPKIPrefix)){8}
RETURN p
LIMIT 1

The Donut of covering RPKI prefixes. Starts at a /24 and goes all the way to a /16.

What's Changed

Prefix remodeling in #191

Full Changelog: v3.1.0...v4.0.0

Assets 2

06 Jun 08:10

m-appel

v3.1.0

25970f0

v3.1.0

New Dataset

OpenINTEL CRuX data in #181
DNS Graph crawler for CRuX data in #185. Due to the size of this dataset, it is currently not included in the weekly dumps.

What's Changed

Update Cloudflare crawler for better performance in #166
Various documentation updates (OpenINTEL #183, gallery #186)
Miscellaneous crawler fixes

Full Changelog: v3.0.1...v3.1.0

Assets 2

02 Apr 06:56

m-appel

v3.0.1

61905d7

v3.0.1

This is a minor release that fixes several small bugs.

Full Changelog: v3.0.0...v3.0.1

Assets 2

28 Feb 09:13

m-appel

v3.0.0

9fcf944

v3.0.0

Prefix Node Rework

We are releasing this as a new major version since the changes introduced in #168 require special care when fetching PART_OF relationships of IP and Prefix nodes in the future.

In general, all Prefix nodes now have one or more subtype. Possible types (at the moment, see here for updated info) are:

BGPPrefix
GeoPrefix
PeeringLAN
RDNSPrefix
RIRPrefix
RPKIPrefix

This complicates the generation of PART_OF relationships for IP and Prefix nodes. As a tradeoff between ease-of-use and number of relationships, we proceed as followed:

Build a prefix tree for all prefixes of the same type and connect them with PART_OF relationships
Map an IP to the most-specific prefix of each type

However, since a prefix can have multiple types (e.g., a BGPPrefix can also be an RPKIPrefix) this would create a lot of redundant relationships. Furthermore, it can lead to cases were the correct PART_OF relationship can not be inferred with a simple query.

For example, an IP x is part of a BGPPrefix a/24. Now there might exist another BGPPrefix b/23, that is covering a/24 and is also an RDNSPrefix. This would cause PART_OF relationships to both a and b (since both are the longest match for one prefix type). Therefore it would not be possible to only return the most-specific BGPPrefix with a single query.

// This query returns both BGPPrefix nodes.
MATCH p = (:IP {ip: x})-[:PART_OF]->(:BGPPrefix)
RETURN p

As a solution to this, and to reduce the number of relationships, we add a prefix_types property to the PART_OF relationship. It is a list that contains the labels of the prefix types for which this relationship indicates the longest match. If there are multiple PART_OF relationships originating from an IP node, the prefix_types properties of these will be non-overlapping. Thus, to get the most-specific BGPPrefix node from the example above:

MATCH p = (:IP {ip: x})-[po:PART_OF]->(:BGPPrefix)
WHERE 'BGPPrefix' in po.prefix_types
RETURN p

A query for a specific prefix type with PART_OF thus always requires a filter on the prefix_types property!

First Steps Towards Geographical Data

In #177 we introduced modelling of geometric points that are already in our existing datasets. We introduced a new node type Point connected with LOCATED_IN relationships to existing resources (for now AS from CAIDA, and Facility / Organization from PeeringDB). This enables the use of spatial functions available in Cypher. The modelling is still very basic and will be enhanced in the future.

New Dataset

IPinfo IP-to-country mapping by @maxmouchet in #178
SimulaMet rDNS data: The crawler for this dataset was already implemented, but blocked by the prefix node rework.

What's Changed

Add more specific prefix node types #168
Introduce Point node label and add geolocation modelling to existing crawlers #177
Added unsorted status code to PCH crawler
Neo4j updated to version 5.26.3
Updated pre-commit hooks

New Contributors

@maxmouchet made their first contribution in #178

Full Changelog: v2.2.0...v3.0.0

Contributors

maxmouchet

Assets 2

19 Feb 06:04

m-appel

v2.2.0

b8ce90a

v2.2.0

New Dataset

CAIDA AS to organization mappings by @jehuddleston in #172

What's Changed

Link to crawler README's from dataset page
Update OpenINTEL data endpoint and API
Handle invalid example tests in OONI webconnectivity

New Contributors

@jehuddleston made their first contribution in #172

Full Changelog: v2.1.0...v2.2.0

Contributors

jehuddleston

Assets 2

08 Feb 18:20

m-appel

v2.1.0

a3f0959

v2.1.0

New Datasets

What's Changed

Add autodeploy scripts
Enforce canonical IPv6 formatting
Documentation updates
- Tables for data sources, node types, and relationship types
- Gallery updates
- More instructions
Rework logging
Use elementId() instead of deprecated id() in neo4j

Full Changelog: v2.0.0...v2.1.0

Assets 2

20 Feb 08:38

romain-fontugne

v2.0.0

9568521

v2.0.0

Summary

The main change for this version is the remodeling of DNS data (including new node types, e.g. HostName), inclusion of a lot of new datasets, new reference attributes for relationships, and a lot of code cleaning and bug fixes.

List of changes

new datasets:
- DNS resolution chain (OpenINTEL)
- DNS resolution for umbrella, NS, and MX (openINTEL)
- URL classification (Citizenlab)
- sibling ASes (InetIntel)
- Atlas probe (RIPE)
- Atlas measurement (RIPE)
- IXP (PCH)
- url2hostname (post-process)
- umbrella (CISCO)
- IPv6 AS Hegemony (IHR)
- AS Relationship IPv4 & IPv6 (BGPkit)
- Alice looking glass
- RoVista (Virginia Tech)
support for node with multiple labels
new reference attributes (reference_time_modification reference_time_fetch and reference_url_data, reference_url_info) replacing reference_time and reference_url
most (all?) crawlers push nodes and links in batches
docker service for public instance
pre-commit checks
automatically add neo4j constrains and indexes
updated to neo4j 5.16
code cleaning and numerous bug fixes

Assets 2

02 Feb 02:09

romain-fontugne

v1.1.0

38ed602

v1.1.0

Summary

Change the labels for nodes to be conform with Neo4j naming convention.

Features

Renaming of node labels (e.g. DOMAIN_NAME is now DomainName)
Simplified docker usage with docker_compose file

Assets 2

Releases: InternetHealthReport/internet-yellow-pages

v4.0.2

What's Changed

Uh oh!

v4.0.1

What's Changed

Uh oh!

v4.0.0

(Another) Prefix Node Rework

Example

What's Changed

Uh oh!

v3.1.0

New Dataset

What's Changed

Uh oh!

v3.0.1

Uh oh!

v3.0.0

Prefix Node Rework

First Steps Towards Geographical Data

New Dataset

What's Changed

New Contributors

Contributors

Uh oh!

v2.2.0

New Dataset

What's Changed

New Contributors

Contributors

Uh oh!

v2.1.0

New Datasets

What's Changed

Uh oh!

v2.0.0

Summary

List of changes

Uh oh!

v1.1.0

Summary

Features

Uh oh!