Skip to content

Releases: InternetHealthReport/internet-yellow-pages

v4.0.2

24 Oct 22:47
00c85d5

Choose a tag to compare

This release fixes two crawlers and adds a rerun functionality to postprocess scripts.

What's Changed

  • RIPE Atlas introduced new fields to the measurement metadata, which broke the crawler.
  • Cloudflare crawlers sometimes run into rate limiting, which was not handled correctly because they do not include a Retry-After header even though they claim to do so.
  • Add rerun functionality to postprocess scripts.

Full Changelog: v4.0.1...v4.0.2

v4.0.1

19 Sep 05:02

Choose a tag to compare

What's Changed

  • InetIntel changed their format, so this release fixes that crawler.

Full Changelog: v4.0.0...v4.0.1

v4.0.0

10 Sep 11:59
9f13ac9

Choose a tag to compare

(Another) Prefix Node Rework

Remember the changes to the prefix nodes introduced in v3.0.0? Although adding multiple labels to one node is clean, but querying was just getting too complicated. We also found some cases where normal querying just did not work. This is why we are remodeling the prefix nodes (again)!

Instead of adding multiple labels to a single node, each node type now has its own node (but all still have the Prefix label). While this increases the number of nodes and relationships in the graph, it makes querying considerably simpler:

  1. Each IP node has one PART_OF relationship to the most-specific covering prefix of each type.
  2. Each Prefix node type has one PART_OF relationship to the most-specific covering prefix of each type.

For the inter-prefix PART_OF relationships there is one catch: A PART_OF relationship between two different types can be between two nodes with the same prefix property (e.g., a BGPPrefix that is also a RPKIPrefix). For relationships between the same type, the PART_OF relationship indicates the most-specific covering prefix.

Warning: The prefix property is now only unique within each subtype. The general Prefix type is still there for convenience, but querying it will just return some prefix, even when prefix filter is specified. The Prefix type should thus not be used in most queries.

Example

MATCH p = (:IP {ip: '102.218.130.10'})-[:PART_OF]->+(:Prefix)
RETURN p
graph

Although hard to see due to the cutoff label, here we see an IP node (blue), that is connected to its most-specific RIRPrefix (green), BGPPrefix (pink), RPKIPrefix (yellow), and RDNSPrefix (orange). The symmetric relationships between these prefixes indicates that they are actually all the same. The BGP and RDNS prefixes then have larger covering prefixes.

MATCH p = ((:RPKIPrefix)-[:PART_OF]->(:RPKIPrefix)){8}
RETURN p
LIMIT 1
graph (1)

The Donut of covering RPKI prefixes. Starts at a /24 and goes all the way to a /16.

What's Changed

  • Prefix remodeling in #191

Full Changelog: v3.1.0...v4.0.0

v3.1.0

06 Jun 08:10

Choose a tag to compare

New Dataset

  • OpenINTEL CRuX data in #181
  • DNS Graph crawler for CRuX data in #185. Due to the size of this dataset, it is currently not included in the weekly dumps.

What's Changed

  • Update Cloudflare crawler for better performance in #166
  • Various documentation updates (OpenINTEL #183, gallery #186)
  • Miscellaneous crawler fixes

Full Changelog: v3.0.1...v3.1.0

v3.0.1

02 Apr 06:56

Choose a tag to compare

This is a minor release that fixes several small bugs.

Full Changelog: v3.0.0...v3.0.1

v3.0.0

28 Feb 09:13

Choose a tag to compare

Prefix Node Rework

We are releasing this as a new major version since the changes introduced in #168 require special care when fetching PART_OF relationships of IP and Prefix nodes in the future.

In general, all Prefix nodes now have one or more subtype. Possible types (at the moment, see here for updated info) are:

  • BGPPrefix
  • GeoPrefix
  • PeeringLAN
  • RDNSPrefix
  • RIRPrefix
  • RPKIPrefix

This complicates the generation of PART_OF relationships for IP and Prefix nodes. As a tradeoff between ease-of-use and number of relationships, we proceed as followed:

  • Build a prefix tree for all prefixes of the same type and connect them with PART_OF relationships
  • Map an IP to the most-specific prefix of each type

However, since a prefix can have multiple types (e.g., a BGPPrefix can also be an RPKIPrefix) this would create a lot of redundant relationships. Furthermore, it can lead to cases were the correct PART_OF relationship can not be inferred with a simple query.

For example, an IP x is part of a BGPPrefix a/24. Now there might exist another BGPPrefix b/23, that is covering a/24 and is also an RDNSPrefix. This would cause PART_OF relationships to both a and b (since both are the longest match for one prefix type). Therefore it would not be possible to only return the most-specific BGPPrefix with a single query.

// This query returns both BGPPrefix nodes.
MATCH p = (:IP {ip: x})-[:PART_OF]->(:BGPPrefix)
RETURN p

As a solution to this, and to reduce the number of relationships, we add a prefix_types property to the PART_OF relationship. It is a list that contains the labels of the prefix types for which this relationship indicates the longest match. If there are multiple PART_OF relationships originating from an IP node, the prefix_types properties of these will be non-overlapping. Thus, to get the most-specific BGPPrefix node from the example above:

MATCH p = (:IP {ip: x})-[po:PART_OF]->(:BGPPrefix)
WHERE 'BGPPrefix' in po.prefix_types
RETURN p

A query for a specific prefix type with PART_OF thus always requires a filter on the prefix_types property!

First Steps Towards Geographical Data

In #177 we introduced modelling of geometric points that are already in our existing datasets. We introduced a new node type Point connected with LOCATED_IN relationships to existing resources (for now AS from CAIDA, and Facility / Organization from PeeringDB). This enables the use of spatial functions available in Cypher. The modelling is still very basic and will be enhanced in the future.

New Dataset

What's Changed

  • Add more specific prefix node types #168
  • Introduce Point node label and add geolocation modelling to existing crawlers #177
  • Added unsorted status code to PCH crawler
  • Neo4j updated to version 5.26.3
  • Updated pre-commit hooks

New Contributors

Full Changelog: v2.2.0...v3.0.0

v2.2.0

19 Feb 06:04
b8ce90a

Choose a tag to compare

New Dataset

What's Changed

  • Link to crawler README's from dataset page
  • Update OpenINTEL data endpoint and API
  • Handle invalid example tests in OONI webconnectivity

New Contributors

Full Changelog: v2.1.0...v2.2.0

v2.1.0

08 Feb 18:20

Choose a tag to compare

New Datasets

What's Changed

  • Add autodeploy scripts
  • Enforce canonical IPv6 formatting
  • Documentation updates
    • Tables for data sources, node types, and relationship types
    • Gallery updates
    • More instructions
  • Rework logging
  • Use elementId() instead of deprecated id() in neo4j

Full Changelog: v2.0.0...v2.1.0

v2.0.0

20 Feb 08:38
9568521

Choose a tag to compare

Summary

The main change for this version is the remodeling of DNS data (including new node types, e.g. HostName), inclusion of a lot of new datasets, new reference attributes for relationships, and a lot of code cleaning and bug fixes.

List of changes

  • new datasets:
    • DNS resolution chain (OpenINTEL)
    • DNS resolution for umbrella, NS, and MX (openINTEL)
    • URL classification (Citizenlab)
    • sibling ASes (InetIntel)
    • Atlas probe (RIPE)
    • Atlas measurement (RIPE)
    • IXP (PCH)
    • url2hostname (post-process)
    • umbrella (CISCO)
    • IPv6 AS Hegemony (IHR)
    • AS Relationship IPv4 & IPv6 (BGPkit)
    • Alice looking glass
    • RoVista (Virginia Tech)
  • support for node with multiple labels
  • new reference attributes (reference_time_modification reference_time_fetch and reference_url_data, reference_url_info) replacing reference_time and reference_url
  • most (all?) crawlers push nodes and links in batches
  • docker service for public instance
  • pre-commit checks
  • automatically add neo4j constrains and indexes
  • updated to neo4j 5.16
  • code cleaning and numerous bug fixes

v1.1.0

02 Feb 02:09
38ed602

Choose a tag to compare

Summary

Change the labels for nodes to be conform with Neo4j naming convention.

Features

  • Renaming of node labels (e.g. DOMAIN_NAME is now DomainName)
  • Simplified docker usage with docker_compose file