Added functionality of parallel maximal independent set#145
Added functionality of parallel maximal independent set#145pnijhara wants to merge 19 commits intonetworkx:mainfrom
Conversation
|
@dschult please review it. |
dschult
left a comment
There was a problem hiding this comment.
Thanks for this!
Some formatting suggestions before diving into the code/tests:
- we typically don't put error messages on the "assert" command. The pytest report of the assert failure is sufficient.
- we also typically dont include a comment line at the beginning of the test when the test function's name is sufficient. If the test involves soemthing special or unexpected, then a couple line doc_string (or comment) is used. We aren't building docs for the tests.
- instead of building a sequential version of the code inside nx-parallel, does it make sense to just "fall back" to the NetworkX version?
I tried this approach; however, the tests, because of nx.maximal_independent_set, were failing. Because of this, I attempted to implement the sequential mis approach myself. |
|
OK... I understtand. That makes sense. But the NetworkX backend system has a way to flag functions so that they should not run under certain conditions. The backend provides a "should_run" function which is called by NetworkX before it tries to run the backend version of the function. If the In nx-parallel we have a set of utility functions to help us set that up. They are currently used in @nxp._configure_if_nx_active(should_run=nxp.should_run_if_sparse(threshold=0.3))
def harmonic_centrality(That should_run function checks the density of the graph against a threshold. There is another function provided in utils called @nxp._configure_if_nx_active(should_run=nxp.should_run_if_large(50000))Unfortunately, while the function has been created, it currently has no way to pass in the size of the threshold -- and it is set to 200 nodes and currently not used anywhere. You can still use it here though -- it will ignore the input -- and use 200. But you can get this code working with that cutoff. If you are up for it, you could update the utility function If you'd prefer not to mess with that let me know and I'll fix it. |
|
I tried as suggested, I am not very sure whether I have done this on correct side. Now, instead of duplicating NetworkX's sequential MIS code, we now fall back to their implementation for small graphs (< 50,000 nodes). What I did:
Modified to work in two modes:
Also updated all policy signatures to accept
Added manual |
dschult
left a comment
There was a problem hiding this comment.
I haven't worked through the actual code yet, but looked at the tests and the docs. The tests seem like maybe some of them are already covered when we run the networkx tests using the environment variable set for nx-parallel. In that case, each networkx test is run with the nx-parallel code being called when it is supported. And we run that in the nx-parallel CI tests.
- can you see whether any of the nx-parallel tests are already tested by the networkx tests -- and remove the duplicates?
- I have a couple other comments/questions below.
I hope to look at the code itself soon.
:)
nx_parallel/algorithms/mis.py
Outdated
| from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher | ||
| _nx_mis = inspect.unwrap(_nx_mis_dispatcher) |
There was a problem hiding this comment.
I think you can use nx.maximal_independent_set._orig_func as the way to call the original networkx function. Can you check if that works? It is possible I haven't fully understood how that works. That way we don't have to use inspect.
I checked all the tests of nx and nx-parallel and found most of the tests are already covered. For example, Some other tests that I can think of are:
Let me know if these add value. I can think of implementing some of them.
I fixed this one by from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher
_nx_mis = _nx_mis_dispatcher.orig_func |
| # If nodes_threshold is a graph-like object, it's being used as a direct should_run | ||
| # function instead of a factory. Use default threshold. |
There was a problem hiding this comment.
I would prefer the function signature here be: should_run_if_large(G, nodes_threshold=200, *_, **__)
So can you briefly explain where this might be used as a factory? The previous version doesn't seem to deal with that case. Of course, it didn't deal with keyword args either... ;}
nx_parallel/algorithms/mis.py
Outdated
| # Validate directed graph | ||
| if G.is_directed(): | ||
| raise nx.NetworkXNotImplemented("Not implemented for directed graphs.") |
There was a problem hiding this comment.
Does the networkx version work with directed graphs?
If the networkx version uses the not_implemented_for decorator, then I think that decorator runs before the backend is called. You can check whether this is ever called by changing the message (locally) and trying it to see which method gets shown.
nx_parallel/algorithms/mis.py
Outdated
| # Convert seed to Random object if needed (for fallback and parallel execution) | ||
| import random | ||
| if seed is not None: | ||
| if hasattr(seed, 'random'): | ||
| # It's already a RandomState/Random object | ||
| rng = seed | ||
| else: | ||
| # It's a seed value | ||
| rng = random.Random(seed) | ||
| else: | ||
| rng = random.Random() |
There was a problem hiding this comment.
Similarly doesn't the see inut get handled by the decorators in the networkx code before it ever gets to the backend? Check...
nx_parallel/algorithms/mis.py
Outdated
| # Check if we should run parallel version | ||
| # This is needed when backend is explicitly specified | ||
| should_run_result = maximal_independent_set.should_run(G, nodes, seed) | ||
| if should_run_result is not True: | ||
| # Fall back to NetworkX sequential (unwrapped version needs Random object) | ||
| return _nx_mis(G, nodes=nodes, seed=rng) |
There was a problem hiding this comment.
This definitely gets called by the backend machinery. This code should never be reached.
nx_parallel/algorithms/mis.py
Outdated
|
|
||
| # Parallel strategy: Run complete MIS algorithm on node chunks independently | ||
| # Then merge results by resolving conflicts | ||
| all_nodes = list(G.nodes()) |
There was a problem hiding this comment.
| all_nodes = list(G.nodes()) | |
| all_nodes = list(G) |
nx_parallel/algorithms/mis.py
Outdated
| if nodes_set: | ||
| available = set(all_nodes) - nodes_set | ||
| for node in nodes_set: | ||
| available.discard(node) |
There was a problem hiding this comment.
Dont these nodes already get removed two lines up?
nx_parallel/algorithms/mis.py
Outdated
| if nodes_set: | ||
| for node in nodes_set: | ||
| excluded.update(adj_dict[node]) |
There was a problem hiding this comment.
Didn't we do this already with available?
|
@dschult thanks for the detailed review. I made the suggested changes and also tried to defend myself in some cases :p |
|
@dschult |
dschult
left a comment
There was a problem hiding this comment.
It looks like the style test is not passing.
I have two more import related comments.
I think the tests should focus on aspects of the nx-parallel implementation. All the NetworkX tests are run on this function -- so don't duplicate those. You can add tests to the networkx tests if you like. :) but not in this PR. :)
And I think you still haven't responded to the comment about code that is inside a "should_run" check. The should_run gets called by the networkx dispatcher so we should never get to the check here in the backend.
nx_parallel/algorithms/mis.py
Outdated
| # Import the actual NetworkX implementation | ||
| from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher | ||
| _nx_mis = _nx_mis_dispatcher.orig_func |
There was a problem hiding this comment.
I think this can be simpler -- not even an import. And since we only use it once, maybe we can just use the expression directly without the abbreviation.
# abbreviation for the actual NetworkX implementation
_nx_mis = nx.maximal_independent_set.orig_func
nx_parallel/algorithms/mis.py
Outdated
| @@ -0,0 +1,189 @@ | |||
| from typing import Any | |||
|
@dschult tests are failing because the parallel algorithm produces a valid MIS [4, 3, 2, 0, 1] but in a different order than the sequential [1, 0, 3, 2, 4]. This is expected behavior for parallel execution. The question is: should the parallel backend match sequential ordering exactly, or is correctness (independent + maximal) sufficient? We can either accept the test failure as expected (parallel produces correct but different results) or make small graphs fall back to sequential, even when |
|
We should be testing correctness. So we probably need a small change to the NetworkX test. I think that test should be updated in NetworkX to accept any ordering -- probably by comparing the set of the output to an expected value set. There may be a way for backends to avoid NetworkX tests that give different results, but I don't know it. @eriknw, is there a way to e.g. mark a networkx test as xfail when running it on a backend? In this case though, I think a small PR to change line 15 of assert set(nx.maximal_independent_set(G, seed=1)) == {1, 0, 3, 2, 4}Also, there is even talk in NetworkX about whether mis should return a list or a set... But It'd be good to have this changed. |
|
Thanks for the ping!
Yeah, for easy changes, I favor tweaking tests in NetworkX. Moreover, if a backend add new tests to test for gaps in NX tests, I prefer to upstream those to so NetworkX and all backends benefit. One challenge with this approach though is that updated tests are part of the next NetworkX release, and these tests may still fail now.
That doc should also probably refer to https://docs.pytest.org/en/stable/reference/reference.html#std-hook-pytest_collection_modifyitems You can see how |
|
@dschult let me know what decision we are taking for this? Ideally, we should update the NetworkX tests. However, those will take time to get into the final release. Three things we can do now:
@staticmethod
def on_start_tests(items):
"""Modify pytest items after tests have been collected.
This is called during pytest_collection_modifyitems phase.
"""
try:
import pytest
except ModuleNotFoundError:
return
def key(testpath):
"""Parse testpath into (test_name, keywords) tuple."""
filename, path = testpath.split(":")
*names, testname = path.split(".")
if names:
[classname] = names
return (testname, frozenset({classname, filename}))
return (testname, frozenset({filename}))
# Define tests that should be marked as xfail
xfail = {
key("test_mis.py:TestMaximalIndependentSet.test_random_order"):
"Parallel MIS produces different valid results than sequential",
key("test_mis.py:TestMaximalIndependentSet.test_seed_deterministic"):
"Parallel MIS with chunks produces different order than sequential",
}
# Mark tests
for item in items:
kset = set(item.keywords)
for test_name, reason in xfail.items():
test_name_val, keywords = test_name
if item.name == test_name_val and keywords.issubset(kset):
item.add_marker(pytest.mark.xfail(reason=reason)) |
|
Nice! Let's do numbers 2 and 3. So, work on 3. And if by the time we've finished that here, we still need to do 2, we can do that then. |
There was a problem hiding this comment.
Pull request overview
This PR adds a parallel backend implementation of maximal_independent_set to nx_parallel, wires it into the backend interface/exports, and extends the should_run_if_large policy to support being used as a threshold factory (e.g., should_run_if_large(50000)).
Changes:
- Added
nx_parallel.algorithms.mis.maximal_independent_setand registered it in the backend interface and package exports. - Updated
should_run_if_largeto support both direct graph checks and “factory usage” (threshold-first) call patterns. - Added MIS-focused tests and updated chunking-related smoke tests to account for the new algorithm.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
nx_parallel/utils/should_run_policies.py |
Extends should_run_if_large API to support threshold factory usage and broadens should_run signatures. |
nx_parallel/algorithms/mis.py |
Introduces the new parallel MIS implementation and should_run policy thresholding. |
nx_parallel/interface.py |
Registers maximal_independent_set in the backend algorithm list and adds a test-time xfail hook. |
nx_parallel/algorithms/__init__.py |
Exports MIS algorithm(s) via from .mis import *. |
nx_parallel/algorithms/tests/test_mis.py |
Adds unit tests for should_run, chunking, and seeded determinism. |
nx_parallel/tests/test_get_chunks.py |
Excludes maximal_independent_set from the generic get_chunks equivalence smoke test. |
_nx_parallel/__init__.py |
Adds get_info() metadata/docs entry for maximal_independent_set. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@dschult, I just implemented and tested 3. Looks like it is working fine. If this is good to go, I can open a new issue at NetworkX for solving 2. PS: Please ignore the reviews by Copilot. I didn't know it can review the PRs. I got it today with my education pack. Seems like a too much to me. |
|
@dschult, can you review and merge this? I think the tests are passing. We can revert these tests once we get the new release of NetworkX. |
What does this PR change?
This PR implements the parallel maximal independent set via Luby's random permutation mechanism (solving issue #144)
What is your approach, and why did you choose it?
A standard parallel MIS algorithm assigns a random priority to every vertex once per round, then selects all locally maximal vertices at the same time:
Why this is parallel:
This matches the PRAM formulation and is widely used in CPU and GPU environments.
Brief summary of changes (1–2 lines)