Skip to content

feat: Add "deps" command to generate a graph of rule depdendencies.#498

Merged
plusvic merged 24 commits intoVirusTotal:mainfrom
wxsBSD:deps
Mar 3, 2026
Merged

feat: Add "deps" command to generate a graph of rule depdendencies.#498
plusvic merged 24 commits intoVirusTotal:mainfrom
wxsBSD:deps

Conversation

@wxsBSD
Copy link
Copy Markdown
Contributor

@wxsBSD wxsBSD commented Nov 14, 2025

This branch adds a "deps" command that generates dependency information for a set of rules. It walks the AST looking for identifiers of rules, modules and unknown identifiers (hopefully external variables) and collects information about them. For any given rule it will output either the dependencies of that rule or the reverse dependencies of that rule. The output is in the form of a graphviz file that can be piped into dot to generate a visual graph.

For example, given these rules:

rule a { condition: pe.is_dll() }
rule b { condition: a }
rule c { condition: b }
rule d { condition: false }

You can print out the dependencies of a with yr deps -r a rules/test.yara:

digraph {
  a [fillcolor=paleturquoise, style="filled"];
  pe [fillcolor=palegreen, style="filled"];
  a -> pe;
}

This can be useful if you're looking to find the minimum set of rules and modules needed to share rule a in this case. It obviously becomes harder to determine this without a dependency walker when you have more complex graphs. For example, knowing the set of rules and imports to share rule c is more complex just due to the length of the chain.

You can also get reverse dependencies, which is a nice thing to know when you want to make a change to a rule. For example, if I were to change rule a it would be nice to know that I haven't broken any of the rules that depend upon it (directly or indirectly). Assuming those rules have "expected matches" values in the metadata you can use the dependency walking code to determine what rules to test and what they should match.

yr deps -R -r a rules/test.yara:

digraph {
  a [fillcolor=paleturquoise, style="filled"];
  b [fillcolor=paleturquoise, style="filled"];
  b -> a;
  c [fillcolor=paleturquoise, style="filled"];
  c -> b;
}

Given a set of rules parse it and walk the AST to find identifiers and generate
a dot file of them that can be fed into graphviz for visualization.

By default it generates a graph of all the rules but you can select any number
of rules with the -r argument.

For example, given these rules:

```
rule a { condition: pe.is_dll() }
rule b { condition: a }
rule c { condition: b }
rule d { condition: false }
```

And selecting using `-r b` you get output that looks like this:

```
digraph {
  b [fillcolor=paleturquoise, style="filled"];
  a [fillcolor=paleturquoise, style="filled"];
  pe [fillcolor=palegreen, style="filled"];
  a -> pe;
  b -> a;
}
```

This mode is best thought of as "what is the minimum set of rules and imports I
need to execute the selected rule."

Using the -R argument displays the reverse dependencies of a rule. For the same
rules above the output when using -R is:

```
digraph {
  b [fillcolor=paleturquoise, style="filled"];
  c [fillcolor=paleturquoise, style="filled"];
  c -> b;
}
```

This mode is best thought of as "if I change this rule, what other rules do I
also need to test."
Move the dependency walking code to it's own command and make it hidden by
default until it gets more testing.
@wxsBSD
Copy link
Copy Markdown
Contributor Author

wxsBSD commented Nov 14, 2025

I don't have any tests for this yet, but I'm willing to write them if you think this is a good idea to include in yara-x. I'm just putting this out there now to get some early feedback.

I have tested this with a very complex set of rules from work and it does parse them and output graphs. However, the graphs quickly turn very hard to understand if you have exceptionally large dependency chains in your output. For smaller graphs (dozens of dependencies) it looks much better.

Comment thread cli/src/commands/deps.rs Outdated
Comment thread cli/src/commands/deps.rs Outdated
Comment thread cli/src/commands/deps.rs Outdated
@wxsBSD
Copy link
Copy Markdown
Contributor Author

wxsBSD commented Nov 18, 2025

I've updated the code to use the new features you've added and it works great, thanks!

I've decided to stop efforts to track unknown identifiers because it can get weird. For example, these two rules produce drastically different ASTs:

rule a { condition: pe.signatures.len() > 0 }
rule b { condition: (pe).signatures.len() > 0 }

The ASTs:

 rule a
 └─ condition
    └─ gt
       ├─ len()
       │  └─ <object>
       │     └─ field access
       │        ├─ pe
       │        └─ signatures
       └─ 0

 rule b
 └─ condition
    └─ gt
       ├─ field access
       │  ├─ pe
       │  └─ len()
       │     └─ <object>
       │        └─ signatures
       └─ 0

In the case of a it is pretty easy to ignore signatures if you only track the first identifier operand to a Expr::FieldAccess however the AST of b is different enough that you have to take a different approach. I couldn't come up with a good way to track unknown identifiers that is robust to these "less normal" ASTs. It felt too fragile to try to do this.

It is for this reason that I went ahead and removed the "unknown identifier" part of this PR and now we only track dependencies to existing rules or things that look like modules (all other identifiers are ignored).

Variables can be tracked by using a vector that behaves as a stack of defined variable identifiers, and another vector containing the indexes within this stack were each variable scope starts.
@plusvic
Copy link
Copy Markdown
Member

plusvic commented Nov 19, 2025

I've updated the code to use the new features you've added and it works great, thanks!

I've decided to stop efforts to track unknown identifiers because it can get weird. For example, these two rules produce drastically different ASTs:

rule a { condition: pe.signatures.len() > 0 }
rule b { condition: (pe).signatures.len() > 0 }

The ASTs:

 rule a
 └─ condition
    └─ gt
       ├─ len()
       │  └─ <object>
       │     └─ field access
       │        ├─ pe
       │        └─ signatures
       └─ 0

 rule b
 └─ condition
    └─ gt
       ├─ field access
       │  ├─ pe
       │  └─ len()
       │     └─ <object>
       │        └─ signatures
       └─ 0

In the case of a it is pretty easy to ignore signatures if you only track the first identifier operand to a Expr::FieldAccess however the AST of b is different enough that you have to take a different approach. I couldn't come up with a good way to track unknown identifiers that is robust to these "less normal" ASTs. It felt too fragile to try to do this.

It is for this reason that I went ahead and removed the "unknown identifier" part of this PR and now we only track dependencies to existing rules or things that look like modules (all other identifiers are ignored).

Field names can't be handled as identifiers because they could cause dependencies that don't exist actually. For instance:

        import "pe"

        rule a {
        condition:
          true
        }

        rule b {
        condition:
            pe.a
        }

With the current implementation the b is reported as dependent on a, but that's not true.

I'm just thinking out loud, but I believe any identifier that is under a field access expression should be ignored, except for the first operand.

@plusvic
Copy link
Copy Markdown
Member

plusvic commented Nov 19, 2025

I think that 50ba863 fixes the issue with field names. While implementing this solution I found a bug fixed in 8eaa4db.

Comment thread cli/src/commands/deps.rs Outdated
@wxsBSD
Copy link
Copy Markdown
Contributor Author

wxsBSD commented Nov 20, 2025

I'm just thinking out loud, but I believe any identifier that is under a field access expression should be ignored, except for the first operand.

I think you're right here. I spent a bit of time trying to come up with a rule that would cause a problem here but I haven't been able to. I did find a different problem but I'll open a different issue for that.

@plusvic
Copy link
Copy Markdown
Member

plusvic commented Mar 2, 2026

I'm rescuing this old PR, and the only thing that holds me from merging it is the new dependency to dot-writer. I wonder whether producing graphviz file is requirement you have, or if you are ok with some other way of ilustrating the dependencies.

For instance, the cargo tree commands show dependencies between crates as an ASCII tree, that we could generating using the ascii_tree crate, which is already a dependency.

@wxsBSD
Copy link
Copy Markdown
Contributor Author

wxsBSD commented Mar 2, 2026

I'm rescuing this old PR, and the only thing that holds me from merging it is the new dependency to dot-writer. I wonder whether producing graphviz file is requirement you have, or if you are ok with some other way of ilustrating the dependencies.

For instance, the cargo tree commands show dependencies between crates as an ASCII tree, that we could generating using the ascii_tree crate, which is already a dependency.

The graphviz output is not a requirement, I just thought it would be a nice way to visualize it. I'll update it to use ascii_tree.

wxsBSD added 2 commits March 2, 2026 14:32
This removes the dependency on dot-writer for graphviz support, and now outputs
dependencies using the ascii_tree crate instead.

To be more explicit on depdency chains we now output the full graph for all
rules (or just the requested rules if using -r). This makes it easier to see the
full chain.

However, I am removing the reverse dependencies option (-R) for now until I have
a better way to implement it.
@wxsBSD
Copy link
Copy Markdown
Contributor Author

wxsBSD commented Mar 3, 2026

I updated it to replace dot-writer with ascii_tree. I also removed the "reverse dependencies" option for now until I can come up with a way nice way to implement it.

plusvic added 2 commits March 3, 2026 09:53
Remove the nodes `modules` and `rule` and put everything under the rule node. Add the `mod: ` prefix to distinguish module dependencies.
@plusvic plusvic enabled auto-merge (squash) March 3, 2026 10:28
@plusvic plusvic merged commit b0486ff into VirusTotal:main Mar 3, 2026
14 checks passed
@wxsBSD wxsBSD deleted the deps branch March 3, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants