Skip to content

Conversation

@AydenMeng
Copy link

Description

Problem

When the system's file descriptor limit is low (e.g., ulimit -n 1024) and the build is configured to use a high number of threads (e.g., 512), the build process can exhaust available file descriptors. This occurs because each multiprocessing worker typically consumes multiple file descriptors (for pipes, queues, semaphores, etc.).

As a result:

  • Some threads fail to create necessary IPC resources.
  • The build hangs or stalls indefinitely.
  • No clear error message is shown to the user, making debugging extremely difficult.

This issue can be confirmed via strace, which reveals failures like:

pipe2(0x7fff56711148, O_CLOEXEC) = -1 EMFILE (Too many open files)

Solution

This PR introduces two complementary fixes:

  • Commit 31d5b3:

    Explicitly catches OSError exceptions caused by file descriptor exhaustion during worker initialization or execution, logs a clear diagnostic message, and terminates the build gracefully—instead of hanging silently.

  • Commit 456393:

    Dynamically computes a safe upper bound for the number of concurrent build threads based on the current file descriptor limit. The calculation assumes ~4 FDs per worker (3 for multiprocessing pipes + 1 for POSIX named semaphores) and caps the thread count accordingly:

  • Breaking change?

  • Impacts security?

  • Includes tests?

How This Was Tested

To reproduce the issue, run the following commands before starting the build (replace [THREAD_COUNT] with the actual number of threads used by your build, e.g., 16):

ulimit -Sn [THREAD_COUNT]      # e.g., ulimit -Sn 16
ulimit -Sn [THREAD_COUNT * 2]  # e.g., ulimit -Sn 32
ulimit -Sn [THREAD_COUNT * 4]  # e.g., ulimit -Sn 64

With the first two settings ([THREAD_COUNT] and [THREAD_COUNT * 2]), the build will likely fail or hang indefinitely due to file descriptor exhaustion (e.g., EMFILE: Too many open files), often without a clear error message.

The third setting ([THREAD_COUNT * 4]) usually provides enough file descriptors for the build to succeed.

After Applying the Patch

Under low FD limits (e.g., ulimit -Sn 16), the build will either:

  • Fail fast with a clear error
  • Automatically reduce concurrency and complete successfully.

Integration Instructions

N/A

Previously, when file descriptors were exhausted in high-concurrency
builds (e.g., 512 threads with 1024 FD limit), the build would hang or
fail silently without clear indication of the root cause.

This change catches relevant OSError instances and terminates the build,
ensuring failures due to resource limits are explicit.

Signed-off-by: Ayden Meng <[email protected]>
@AydenMeng AydenMeng force-pushed the master branch 2 times, most recently from 3dd9779 to 7fe81bb Compare November 28, 2025 07:51
@AydenMeng AydenMeng marked this pull request as ready for review November 28, 2025 10:12
When the number of build threads multiplied by per-thread file
descriptor usage exceeds the system's open file descriptor limit,
some threads may fail to acquire necessary resources (e.g., pipes
or semaphores), leading to deadlocks or hangs during parallel builds.

To prevent this situation, calculate the safety upper limit of
concurrency by dividing the system's maximum file descriptor limit by
3 (An empirical value derived from balancing performance overhead
against the theoretical number of file descriptors consumed per thread).
The actual thread count is then clamped to this safe value.

Other usages of ThreadNum()—such as during actual compilation or log
queue creation—do not significantly contribute to file descriptor
consumption. Therefore, adjusting ThreadNum() globally would be
unwarranted, as it could unnecessarily restrict parallelism in stages
that are not FD-bound.

This ensures stable parallel builds even under constrained resource
limits.

Signed-off-by: Ayden Meng <[email protected]>
@mikebeaton
Copy link
Member

mikebeaton commented Nov 28, 2025

I think two final ones from me: can you change the second commit subject line to 'BaseTools: Cap AutoGen thread count to avoid file descriptor exhaustion'; and can you change the newly added .info message to .verbose, after all - now that I've understood that it's not affecting the 'main' purpose of -n, I don't think we need to surface it at .info level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants