Added functionality of parallel maximal independent set by pnijhara · Pull Request #145 · networkx/nx-parallel

pnijhara · 2025-12-13T05:44:38Z

What does this PR change?

This PR implements the parallel maximal independent set via Luby's random permutation mechanism (solving issue #144)

Add parallel implementation of maximal_independent_set algorithm
Implements Luby-style randomized parallel algorithm for speedup on large graphs (50K+ nodes)
Works best on large graphs (n>50,000)

What is your approach, and why did you choose it?

A standard parallel MIS algorithm assigns a random priority to every vertex once per round, then selects all locally maximal vertices at the same time:

while graph not empty:
    generate random priority for each vertex
    S = { u | priority(u) > priority(v) for all v in N(u) }
    add S to MIS
    remove S ∪ N(S) from graph

Why this is parallel:

Priority comparisons for all vertices are independent operations.
Set S can be selected in one parallel pass.
All vertices in S are guaranteed independent.
Removing S ∪ N(S) can also be done in parallel.

This matches the PRAM formulation and is widely used in CPU and GPU environments.

Brief summary of changes (1–2 lines)

Added a file algorithms/mis.py
Added tests for algorithms/mis.py in algorithms/tests/test_mis.py
Found in the local benchmark tests that for node count less than 50K, sequential code is working best; however, for larger graphs such as n = 100K and 1M, parallel code is working best. This may be because of joblib's parallelization overhead. I tested this in C++ with OpenMP. The same issue persists there, but with a lesser node count.

…ss than 50K

pnijhara · 2025-12-13T08:06:15Z

@dschult please review it.

dschult

Thanks for this!

Some formatting suggestions before diving into the code/tests:

we typically don't put error messages on the "assert" command. The pytest report of the assert failure is sufficient.
we also typically dont include a comment line at the beginning of the test when the test function's name is sufficient. If the test involves soemthing special or unexpected, then a couple line doc_string (or comment) is used. We aren't building docs for the tests.
instead of building a sequential version of the code inside nx-parallel, does it make sense to just "fall back" to the NetworkX version?

pnijhara · 2025-12-14T07:07:57Z

instead of building a sequential version of the code inside nx-parallel, does it make sense to just "fall back" to the NetworkX version?

I tried this approach; however, the tests, because of nx.maximal_independent_set, were failing. Because of this, I attempted to implement the sequential mis approach myself.

dschult · 2025-12-14T10:08:28Z

OK... I understtand. That makes sense. But the NetworkX backend system has a way to flag functions so that they should not run under certain conditions. The backend provides a "should_run" function which is called by NetworkX before it tries to run the backend version of the function. If the should_run function returns True, the backend version is called. If that function returns a string or False, the backend function does not run (and the string is used to describe why it should not run -- and is used in the logging system for backends.

In nx-parallel we have a set of utility functions to help us set that up. They are currently used in centrality/harmonic.py and algorithms/dag.py. But neither are checking the size of the graph. The way to use "should_run" is with a decorator provided in utils like this from harmonic_centrality:

@nxp._configure_if_nx_active(should_run=nxp.should_run_if_sparse(threshold=0.3))
 def harmonic_centrality(

That should_run function checks the density of the graph against a threshold. There is another function provided in utils called nxp.should_run_if_large. So I think we will use that one. Something like:

@nxp._configure_if_nx_active(should_run=nxp.should_run_if_large(50000))

Unfortunately, while the function has been created, it currently has no way to pass in the size of the threshold -- and it is set to 200 nodes and currently not used anywhere. You can still use it here though -- it will ignore the input -- and use 200. But you can get this code working with that cutoff.

If you are up for it, you could update the utility function should_run_if_large to use the input value in place of the currently hardcoded 200. Maybe call the argument nodes or nodes_threshold? You can use a default value of 200 nodes. Then this PR can use that function with argument 50000.
If you want, you could even make those changes in another PR and we can review/merge it pretty quickly.

If you'd prefer not to mess with that let me know and I'll fix it.

pnijhara · 2025-12-14T15:59:23Z

I tried as suggested, I am not very sure whether I have done this on correct side. Now, instead of duplicating NetworkX's sequential MIS code, we now fall back to their implementation for small graphs (< 50,000 nodes).

What I did:

Dual-mode should_run_policies.py policy

Modified to work in two modes:

Direct function: should_run=nxp.should_run_if_large uses default 200 node threshold
Factory function: should_run=nxp.should_run_if_large(50000) returns wrapper with custom threshold

Also updated all policy signatures to accept **__ for keyword arguments passed by the backend dispatcher.

Unwrapping NetworkX's dispatcher
Used inspect.unwrap() to access the actual NetworkX implementation, bypassing multiple decorator layers. Direct import caused infinite recursion since NetworkX's maximal_independent_set is itself a dispatched function. This was hard to fix. I had to use LLMs, sorry :(
Explicit fallback check

Added manual should_run check because NetworkX bypasses should_run when backend is explicitly specified (e.g., backend='parallel'). Also converted seed to Random object before fallback since the unwrapped NetworkX function expects Random, not int.

dschult

I haven't worked through the actual code yet, but looked at the tests and the docs. The tests seem like maybe some of them are already covered when we run the networkx tests using the environment variable set for nx-parallel. In that case, each networkx test is run with the nx-parallel code being called when it is supported. And we run that in the nx-parallel CI tests.

can you see whether any of the nx-parallel tests are already tested by the networkx tests -- and remove the duplicates?
I have a couple other comments/questions below.

I hope to look at the code itself soon.
:)

dschult · 2025-12-16T21:03:05Z

nx_parallel/algorithms/mis.py

+from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher
+_nx_mis = inspect.unwrap(_nx_mis_dispatcher)


I think you can use nx.maximal_independent_set._orig_func as the way to call the original networkx function. Can you check if that works? It is possible I haven't fully understood how that works. That way we don't have to use inspect.

nx_parallel/utils/should_run_policies.py

pnijhara · 2025-12-17T06:43:20Z

can you see whether any of the nx-parallel tests are already tested by the networkx tests -- and remove the duplicates?

I checked all the tests of nx and nx-parallel and found most of the tests are already covered. For example, test_maximal_independent_set_basic is similr totest_random_graphs. The only test which I think is new and should be kept is test_maximal_independent_set_large_graph `. This is the test that tests the parallel execution. Let me know whether a single test seems to be viable.

Some other tests that I can think of are:

Test chunking behavior - Test different get_chunks parameter values
def test_custom_chunking():
"Test with custom chunk function"
Test threshold boundary - Verify fallback to sequential at threshold (50000 nodes) (this can be an overhead on a low-end machine)
def test_fallback_below_threshold():
"Graph with 49999 nodes should use sequential"
def test_parallel_above_threshold():
"Graph with 50001 nodes should use parallel"
Test n_jobs behavior - Verify parallel execution with different n_jobs (I am not inclined towards this)
def test_with_different_njobs():
"Test with n_jobs=1, 2, 4, etc."

Let me know if these add value. I can think of implementing some of them.

I think you can use nx.maximal_independent_set._orig_func as the way to call the original networkx function. Can you check if that works? It is possible I haven't fully understood how that works. That way we don't have to use inspect.

I fixed this one by

from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher
_nx_mis = _nx_mis_dispatcher.orig_func

dschult · 2025-12-17T13:50:33Z

nx_parallel/utils/should_run_policies.py

+    # If nodes_threshold is a graph-like object, it's being used as a direct should_run
+    # function instead of a factory. Use default threshold.


I would prefer the function signature here be: should_run_if_large(G, nodes_threshold=200, *_, **__)
So can you briefly explain where this might be used as a factory? The previous version doesn't seem to deal with that case. Of course, it didn't deal with keyword args either... ;}

dschult · 2025-12-17T13:58:57Z

nx_parallel/algorithms/mis.py

+    # Validate directed graph
+    if G.is_directed():
+        raise nx.NetworkXNotImplemented("Not implemented for directed graphs.")


Does the networkx version work with directed graphs?
If the networkx version uses the not_implemented_for decorator, then I think that decorator runs before the backend is called. You can check whether this is ever called by changing the message (locally) and trying it to see which method gets shown.

dschult · 2025-12-17T13:59:34Z

nx_parallel/algorithms/mis.py

+    # Convert seed to Random object if needed (for fallback and parallel execution)
+    import random
+    if seed is not None:
+        if hasattr(seed, 'random'):
+            # It's already a RandomState/Random object
+            rng = seed
+        else:
+            # It's a seed value
+            rng = random.Random(seed)
+    else:
+        rng = random.Random()


Similarly doesn't the see inut get handled by the decorators in the networkx code before it ever gets to the backend? Check...

dschult · 2025-12-17T14:01:39Z

nx_parallel/algorithms/mis.py

+    # Check if we should run parallel version
+    # This is needed when backend is explicitly specified
+    should_run_result = maximal_independent_set.should_run(G, nodes, seed)
+    if should_run_result is not True:
+        # Fall back to NetworkX sequential (unwrapped version needs Random object)
+        return _nx_mis(G, nodes=nodes, seed=rng)


This definitely gets called by the backend machinery. This code should never be reached.

dschult · 2025-12-17T14:02:14Z

nx_parallel/algorithms/mis.py

+
+    # Parallel strategy: Run complete MIS algorithm on node chunks independently
+    # Then merge results by resolving conflicts
+    all_nodes = list(G.nodes())


Suggested change

all_nodes = list(G.nodes())

all_nodes = list(G)

dschult · 2025-12-17T14:04:21Z

nx_parallel/algorithms/mis.py

+    if nodes_set:
+        available = set(all_nodes) - nodes_set
+        for node in nodes_set:
+            available.discard(node)


Dont these nodes already get removed two lines up?

dschult · 2025-12-17T14:11:44Z

nx_parallel/algorithms/mis.py

+    if nodes_set:
+        for node in nodes_set:
+            excluded.update(adj_dict[node])


Didn't we do this already with available?

pnijhara · 2025-12-26T11:31:55Z

@dschult thanks for the detailed review. I made the suggested changes and also tried to defend myself in some cases :p
Please review it again.

pnijhara · 2026-01-07T13:57:26Z

@dschult
Can you please review this PR again?

dschult

It looks like the style test is not passing.
I have two more import related comments.

I think the tests should focus on aspects of the nx-parallel implementation. All the NetworkX tests are run on this function -- so don't duplicate those. You can add tests to the networkx tests if you like. :) but not in this PR. :)

And I think you still haven't responded to the comment about code that is inside a "should_run" check. The should_run gets called by the networkx dispatcher so we should never get to the check here in the backend.

dschult · 2025-12-29T21:29:10Z

nx_parallel/algorithms/mis.py

+# Import the actual NetworkX implementation
+from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher
+_nx_mis = _nx_mis_dispatcher.orig_func


I think this can be simpler -- not even an import. And since we only use it once, maybe we can just use the expression directly without the abbreviation.

# abbreviation for the actual NetworkX implementation _nx_mis = nx.maximal_independent_set.orig_func

dschult · 2025-12-29T21:29:45Z

nx_parallel/algorithms/mis.py

@@ -0,0 +1,189 @@
+from typing import Any


This isnt used

pnijhara · 2026-03-09T15:53:05Z

@dschult tests are failing because the parallel algorithm produces a valid MIS [4, 3, 2, 0, 1] but in a different order than the sequential [1, 0, 3, 2, 4]. This is expected behavior for parallel execution. The question is: should the parallel backend match sequential ordering exactly, or is correctness (independent + maximal) sufficient?

We can either accept the test failure as expected (parallel produces correct but different results) or make small graphs fall back to sequential, even when NETWORKX_TEST_BACKEND is set.

dschult · 2026-03-11T07:15:21Z

We should be testing correctness. So we probably need a small change to the NetworkX test. I think that test should be updated in NetworkX to accept any ordering -- probably by comparing the set of the output to an expected value set.

There may be a way for backends to avoid NetworkX tests that give different results, but I don't know it. @eriknw, is there a way to e.g. mark a networkx test as xfail when running it on a backend?

In this case though, I think a small PR to change line 15 of test_mis.py to:

    assert set(nx.maximal_independent_set(G, seed=1)) == {1, 0, 3, 2, 4}

Also, there is even talk in NetworkX about whether mis should return a list or a set... But It'd be good to have this changed.

eriknw · 2026-03-11T15:19:04Z

Thanks for the ping!

I think that test should be updated in NetworkX to accept any ordering

Yeah, for easy changes, I favor tweaking tests in NetworkX. Moreover, if a backend add new tests to test for gaps in NX tests, I prefer to upstream those to so NetworkX and all backends benefit. One challenge with this approach though is that updated tests are part of the next NetworkX release, and these tests may still fail now.

There may be a way for backends to avoid NetworkX tests that give different results, but I don't know it. @eriknw, is there a way to e.g. mark a networkx test as xfail when running it on a backend?

on_start_tests can do this. It is minimally documented: https://networkx.org/documentation/stable/reference/backends.html#creating-a-custom-backend

That doc should also probably refer to https://docs.pytest.org/en/stable/reference/reference.html#std-hook-pytest_collection_modifyitems

You can see how nx-cugraph uses on_start_tests to skip tests here: https://github.com/rapidsai/nx-cugraph/blob/10426797772a3e99bd45cf3cfa96c7ae7184431c/nx_cugraph/interface.py#L33

pnijhara · 2026-03-12T12:41:07Z

@dschult let me know what decision we are taking for this? Ideally, we should update the NetworkX tests. However, those will take time to get into the final release. Three things we can do now:

Keep the failing tests (not an ideal solution)
I should send a PR on NetworkX to update the tests
As suggested in the previous comment, I can edit the interface.py file for on_start_tests in the BackendInterface class. Something like this:

@staticmethod
def on_start_tests(items):
    """Modify pytest items after tests have been collected.
    
    This is called during pytest_collection_modifyitems phase.
    """
    try:
        import pytest
    except ModuleNotFoundError:
        return
    
    def key(testpath):
        """Parse testpath into (test_name, keywords) tuple."""
        filename, path = testpath.split(":")
        *names, testname = path.split(".")
        if names:
            [classname] = names
            return (testname, frozenset({classname, filename}))
        return (testname, frozenset({filename}))
    
    # Define tests that should be marked as xfail
    xfail = {
        key("test_mis.py:TestMaximalIndependentSet.test_random_order"):
            "Parallel MIS produces different valid results than sequential",
        key("test_mis.py:TestMaximalIndependentSet.test_seed_deterministic"):
            "Parallel MIS with chunks produces different order than sequential",
    }
    
    # Mark tests
    for item in items:
        kset = set(item.keywords)
        for test_name, reason in xfail.items():
            test_name_val, keywords = test_name
            if item.name == test_name_val and keywords.issubset(kset):
                item.add_marker(pytest.mark.xfail(reason=reason))

dschult · 2026-03-12T12:50:39Z

Nice! Let's do numbers 2 and 3.
Or, maybe let's focus on 3 here, and hopefully the issue/pr in NetworkX to refactor the mis return value from a list to a set will fix the other issues.

So, work on 3. And if by the time we've finished that here, we still need to do 2, we can do that then.
Thanks!

Copilot

Pull request overview

This PR adds a parallel backend implementation of maximal_independent_set to nx_parallel, wires it into the backend interface/exports, and extends the should_run_if_large policy to support being used as a threshold factory (e.g., should_run_if_large(50000)).

Changes:

Added nx_parallel.algorithms.mis.maximal_independent_set and registered it in the backend interface and package exports.
Updated should_run_if_large to support both direct graph checks and “factory usage” (threshold-first) call patterns.
Added MIS-focused tests and updated chunking-related smoke tests to account for the new algorithm.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`nx_parallel/utils/should_run_policies.py`	Extends `should_run_if_large` API to support threshold factory usage and broadens `should_run` signatures.
`nx_parallel/algorithms/mis.py`	Introduces the new parallel MIS implementation and `should_run` policy thresholding.
`nx_parallel/interface.py`	Registers `maximal_independent_set` in the backend algorithm list and adds a test-time xfail hook.
`nx_parallel/algorithms/__init__.py`	Exports MIS algorithm(s) via `from .mis import *`.
`nx_parallel/algorithms/tests/test_mis.py`	Adds unit tests for `should_run`, chunking, and seeded determinism.
`nx_parallel/tests/test_get_chunks.py`	Excludes `maximal_independent_set` from the generic get_chunks equivalence smoke test.
`_nx_parallel/__init__.py`	Adds `get_info()` metadata/docs entry for `maximal_independent_set`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nx_parallel/utils/should_run_policies.py

nx_parallel/algorithms/mis.py

nx_parallel/algorithms/tests/test_mis.py

pnijhara · 2026-03-12T21:43:22Z

@dschult, I just implemented and tested 3. Looks like it is working fine. If this is good to go, I can open a new issue at NetworkX for solving 2.

PS: Please ignore the reviews by Copilot. I didn't know it can review the PRs. I got it today with my education pack. Seems like a too much to me.

pnijhara · 2026-03-16T11:02:03Z

@dschult, can you review and merge this? I think the tests are passing. We can revert these tests once we get the new release of NetworkX.

pnijhara added 2 commits December 13, 2025 10:54

Added functionality of parallel mis

e3d65be

Fix get chunk tests

562c725

pnijhara changed the title ~~[DRAFT] Added functionality of parallel maximal independent set~~ [WIP] Added functionality of parallel maximal independent set Dec 13, 2025

pnijhara added 3 commits December 13, 2025 13:16

Implemented custom sequential mis instead of nx.mis for node count le…

393ff44

…ss than 50K

Remove unnecessary docstrings

6c94be7

Remove unnecessary docstrings

375f0b1

Schefflera-Arboricola added the type: Enhancement New feature or request label Dec 13, 2025

Schefflera-Arboricola linked an issue Dec 13, 2025 that may be closed by this pull request

Add parallel implementation of Combinatorial Algorithms #144

Open

dschult reviewed Dec 13, 2025

View reviewed changes

pnijhara added 2 commits December 14, 2025 20:15

Remove assert messages and docstrings from mis tests

dacab14

Add NetworkX fallback with custom input node count

c592668

pnijhara requested a review from dschult December 14, 2025 17:10

dschult reviewed Dec 16, 2025

View reviewed changes

Replace nx mis unwrap method with direct call

0a73d7b

dschult reviewed Dec 17, 2025

View reviewed changes

pnijhara added 2 commits December 26, 2025 16:47

Fix suggest review changes in mis.py

e76f9ef

Fix suggested review changes in utils and policies

31770ff

pnijhara requested a review from dschult December 26, 2025 11:32

pnijhara changed the title ~~[WIP] Added functionality of parallel maximal independent set~~ Added functionality of parallel maximal independent set Jan 7, 2026

dschult reviewed Jan 8, 2026

View reviewed changes

pnijhara added 3 commits March 9, 2026 20:32

Remove unused typing.Any import from mis.py

9179f1b

Refactor MIS tests to focus on nx-parallel specific aspects

6810dee

Simplify NetworkX MIS import using nx.maximal_independent_set.orig_func

960b0b3

pnijhara added 3 commits March 9, 2026 21:00

Remove redundant should_run check and unused _nx_mis import

f0a9aec

Fix style issues for line length and quote consistency

8ba51ac

Add final pass to ensure MIS maximality after parallel merge

2abc98b

Add maximal_independent_set to get_info

94e5537

pnijhara requested a review from dschult March 10, 2026 07:05

dschult mentioned this pull request Mar 11, 2026

maximal_independent_set returns a list networkx/networkx#8554

Closed

Copilot AI review requested due to automatic review settings March 12, 2026 21:21

Copilot started reviewing on behalf of pnijhara March 12, 2026 21:22 View session

Add on_start_tests to mark MIS order test as xfail

b0641dd

pnijhara force-pushed the parallel-mis branch from feafaae to b0641dd Compare March 12, 2026 21:24

Copilot AI reviewed Mar 12, 2026

View reviewed changes

pnijhara force-pushed the parallel-mis branch from 9f838e0 to b0641dd Compare March 13, 2026 12:18

Merge remote-tracking branch 'upstream/main' into parallel-mis

c735307

		from networkx.algorithms.mis import maximal_independent_set as _nx_mis_dispatcher
		_nx_mis = inspect.unwrap(_nx_mis_dispatcher)

		# If nodes_threshold is a graph-like object, it's being used as a direct should_run
		# function instead of a factory. Use default threshold.

Uh oh!

Conversation

pnijhara commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR change?

What is your approach, and why did you choose it?

Brief summary of changes (1–2 lines)

Uh oh!

pnijhara commented Dec 13, 2025

Uh oh!

dschult left a comment

Choose a reason for hiding this comment

Uh oh!

pnijhara commented Dec 14, 2025

Uh oh!

dschult commented Dec 14, 2025

Uh oh!

pnijhara commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pnijhara commented Dec 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pnijhara commented Dec 26, 2025

Uh oh!

pnijhara commented Jan 7, 2026

Uh oh!

dschult left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pnijhara commented Mar 9, 2026

Uh oh!

dschult commented Mar 11, 2026

Uh oh!

eriknw commented Mar 11, 2026

Uh oh!

pnijhara commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschult commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pnijhara commented Mar 12, 2026

Uh oh!

pnijhara commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

pnijhara commented Dec 13, 2025 •

edited

Loading

pnijhara commented Dec 14, 2025 •

edited

Loading

pnijhara commented Mar 12, 2026 •

edited

Loading