Skip to content

Handle Joins Properly in Query Rewrite Algorithm #91

@suremarc

Description

@suremarc

The query rewrite implementation is close to supporting joins in the existing code, but additional work is required:

  1. While SpjNormalForm already supports queries with joins, the ViewMatchingRewriter needs to be generalized to take into account queries with multiple tables.
  2. Due to joins potentially producing duplicates, we must check if the query being matched has the same set of tables as the materialized view being considered. This is explained in further detail in the paper

It is possible to relax (1.b) somewhat by implementing the duplication factor test in the paper (see sections 3.1.5 and 3.2), which would let us substitute materialized views whose tables are a superset of the original query, but this will require using DataFusion's constraints to check for appropriate uniqueness & foreign key constraints; hence, this is not a strict generalization. Furthermore, foreign key constraints are not currently supported in DataFusion, so some work will be needed there.

Future work may also add the filter tree proposed in the paper, but IMO this should be done separately as it is an optimization for large numbers of materialized views.

It's also worth noting that the query rewriting algorithm only really works for inner joins, as it relies on the equivalence of these to cross join + select/project/filter. For outer joins, there is another paper that builds on prior work (including the previous paper) to extend the approach to outer joins, by introducing join-disjunctive normal form. However, it is much more technical, e.g. it introduces the notion of minimum union and describes how to reduce these into normal unions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions