Skip to content

fix: replace setInterval with recursive setTimeout in Master.ts to pr…#2869

Merged
evanpelle merged 8 commits intoopenfrontio:mainfrom
Codimow:fix/memory-leak
Jan 14, 2026
Merged

fix: replace setInterval with recursive setTimeout in Master.ts to pr…#2869
evanpelle merged 8 commits intoopenfrontio:mainfrom
Codimow:fix/memory-leak

Conversation

@Codimow
Copy link
Copy Markdown
Contributor

@Codimow Codimow commented Jan 12, 2026

If this PR fixes an issue, link it below. If not, delete these two lines.
Resolves #2868

Description:

This PR addresses a critical memory leak in the Master server process (causing ~30GB RAM usage).

The issue was caused by setInterval calling fetchLobbies() every 100ms. When fetchLobbies took longer than 100ms to complete (due to network latency or load), requests would pile up indefinitely, creating a massive queue of pending Promises and open sockets.

I have refactored the polling logic into a generic startPolling utility (in src/server/PollingLoop.ts) that uses a recursive setTimeout pattern. This ensures that the next fetchLobbies call is only scheduled after the previous one has completed (successfully or failed), preventing any request pile-up.

Please complete the following:

  • I have added screenshots for all UI updates (N/A - backend only)
  • I process any text displayed to the user through translateText() and I've added it to the en.json file (N/A - no user facing text)
  • I have added relevant tests to the test directory (tests/PollingLoop.test.ts)
  • I confirm I have thoroughly tested these changes and take full responsibility for any bugs introduced

Please put your Discord username so you can be contacted if a bug or regression is found:

codimo

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 12, 2026

Warning

Rate limit exceeded

@Codimow has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7918d0e and a3284b5.

📒 Files selected for processing (2)
  • src/server/PrivilegeRefresher.ts
  • src/server/Worker.ts

Walkthrough

Replaces interval-based polling with a recursive, non-overlapping polling utility (startPolling) and integrates it into Master, Worker, and PrivilegeRefresher. Adds PollingLoop implementation and tests; updates polling call sites to await tasks before scheduling the next run and adds logging/error handling.

Changes

Cohort / File(s) Summary
Polling utility
src/server/PollingLoop.ts
New startPolling(task: () => Promise<void>, intervalMs: number) implementing a recursive, await-before-next iteration loop with logger-backed error handling.
Master integration
src/server/Master.ts
Replaces setInterval with startPolling(fetchLobbies, ...); preserves scheduling logic when lobby count is zero and adds polling error handling.
Worker matchmaking
src/server/Worker.ts
Replaces inline polling with startPolling via a new internal startMatchmakingPolling(gm) wrapper; centralizes timeout/AbortController usage and logging, includes instanceId in check-ins.
Privilege refresh
src/server/PrivilegeRefresher.ts
Replaces jittered timers with startPolling to repeatedly call loadPrivilegeChecker at refreshInterval.
Tests
tests/server/PollingLoop.test.ts
New tests using fake timers and a mocked logger to verify non-overlapping execution, continued polling after errors, and correct scheduling.
Metadata
manifest_file, package.json
Minor manifest/package updates (metadata/dependency adjustments).

Sequence Diagram(s)

sequenceDiagram
  participant PollingLoop
  participant Component as Master/Worker/PrivilegeRefresher
  participant Task as fetchLobbies / matchmaking checkin / loadPrivilegeChecker
  participant Scheduler as LobbyScheduler or GameCreator
  participant Logger

  PollingLoop->>Component: invoke task wrapper
  Component->>Task: call async task()
  Task-->>Component: resolves (data or assignment) / rejects (error)
  Component->>Scheduler: (if condition met) scheduleLobbies / create game
  Component->>Logger: log results or errors
  Component-->>PollingLoop: task settled
  PollingLoop->>PollingLoop: wait intervalMs then repeat
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🔁 Timers now queue, each job takes its turn,
No piles of promises left to burn.
A steady loop, logs standing guard,
Errors noted, no overlapping shard.
Servers breathe easy — polling is calm.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: replacing setInterval with recursive setTimeout in Master.ts to fix a memory leak.
Description check ✅ Passed The description is well-detailed and directly related to the changeset, explaining the memory leak issue and the solution using startPolling utility.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #2868: replaces setInterval with recursive setTimeout, prevents request pile-up, ensures previous operations complete before scheduling next execution, and applies the pattern across Master.ts, PrivilegeRefresher.ts, and Worker.ts.
Out of Scope Changes check ✅ Passed All changes are in-scope: PollingLoop.ts is a new generic utility supporting the fix, PrivilegeRefresher.ts and Worker.ts adopt the pattern for consistency, tests verify polling behavior, and package.json updates are minimal.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/server/Master.ts (1)

163-177: Good fix for the memory leak using recursive setTimeout.

The pattern correctly ensures sequential execution: the next fetchLobbies() call is scheduled only after the previous one finishes (success or failure). This prevents the request pile-up that caused the ~30GB memory growth.

A couple of observations:

  1. No cancellation mechanism: There is no way to stop this polling loop (e.g., for graceful shutdown). Consider returning a cleanup function or using an AbortController if you need to stop the loop later.

  2. Unhandled rejection from scheduleLobbies(): Inside .then(), if scheduleLobbies() throws, it becomes an unhandled promise rejection because the error does not flow into your .catch().

🔧 Optional: Handle errors from scheduleLobbies and add cleanup support
-        const runLoop = () => {
-          fetchLobbies()
-            .then((lobbies) => {
+        let stopped = false;
+        const runLoop = async () => {
+          if (stopped) return;
+          try {
+            const lobbies = await fetchLobbies();
             if (lobbies === 0) {
-              scheduleLobbies();
+              await scheduleLobbies();
             }
-            })
-            .catch((error) => {
+          } catch (error) {
             log.error("Error in check loop:", error);
-            })
-            .finally(() => {
-              setTimeout(runLoop, 100);
-            });
+          } finally {
+            if (!stopped) {
+              setTimeout(runLoop, 100);
+            }
+          }
         };
         runLoop();
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e3ced3 and ae9c981.

📒 Files selected for processing (1)
  • src/server/Master.ts

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jan 12, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @src/server/PollingLoop.ts:
- Around line 1-4: Prettier flagged formatting in PollingLoop.ts; run a
formatter (npx prettier --write src/server/PollingLoop.ts) or reformat the
import and const lines to match project Prettier rules (ensure consistent
spacing, semicolons, and line endings around the import of logger and the const
log = logger.child({ comp: "polling" }) statement).

In @tests/PollingLoop.test.ts:
- Around line 1-3: The test file tests/PollingLoop.test.ts has Prettier
formatting issues; run the formatter (e.g., npx prettier --write
tests/PollingLoop.test.ts) or apply Prettier rules to reformat the imports and
file so it matches project style; ensure the imports and any trailing/leading
whitespace around the startPolling import and the vitest imports are fixed to
satisfy CI.
- Around line 18-20: The afterEach hook currently only calls
vi.restoreAllMocks(), which leaves fake timers active; update the afterEach in
PollingLoop.test.ts to also call vi.useRealTimers() (in addition to
vi.restoreAllMocks()) so real timers are restored and fake timers don't leak
into other tests; locate the afterEach block that references
vi.restoreAllMocks() and add vi.useRealTimers() there.
🧹 Nitpick comments (2)
src/server/PollingLoop.ts (1)

14-25: Consider returning a cancel function.

The implementation correctly prevents overlapping executions. For better testability and potential graceful shutdown support, consider returning a function to stop the loop.

♻️ Optional: Add cancellation support
-export function startPolling(task: () => Promise<void>, intervalMs: number) {
+export function startPolling(task: () => Promise<void>, intervalMs: number): () => void {
+    let timeoutId: ReturnType<typeof setTimeout> | null = null;
+    let stopped = false;
+
     const runLoop = () => {
+        if (stopped) return;
         task()
             .catch((error) => {
                 log.error("Error in polling loop:", error);
             })
             .finally(() => {
-                setTimeout(runLoop, intervalMs);
+                if (!stopped) {
+                    timeoutId = setTimeout(runLoop, intervalMs);
+                }
             });
     };
     runLoop();
+
+    return () => {
+        stopped = true;
+        if (timeoutId !== null) {
+            clearTimeout(timeoutId);
+        }
+    };
 }
tests/PollingLoop.test.ts (1)

54-77: Good test for error resilience.

Test correctly verifies polling continues after a task failure. Consider also asserting that the error was logged to ensure the catch block executed.

♻️ Optional: Verify error logging
+import { logger } from '../src/server/Logger';
+
 // After the error task runs, you could verify:
+const mockLogger = logger.child({});
+expect(mockLogger.error).toHaveBeenCalledWith("Error in polling loop:", expect.any(Error));

Note: This requires capturing the mocked logger instance in the test setup.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae9c981 and 2abac27.

📒 Files selected for processing (3)
  • src/server/Master.ts
  • src/server/PollingLoop.ts
  • tests/PollingLoop.test.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-09T06:10:44.499Z
Learnt from: plazmaezio
Repo: openfrontio/OpenFrontIO PR: 2716
File: tests/AllianceAcceptNukes.test.ts:85-87
Timestamp: 2026-01-09T06:10:44.499Z
Learning: In tests under tests/**, objects may stack on the same tile coordinates (e.g., MissileSilo). When writing tests that involve building/placing structures or cooldowns, assume multiple items can share coordinates; adjust assertions to account for stacking, and explicitly manage or reset the stack state where necessary.

Applied to files:

  • tests/PollingLoop.test.ts
🧬 Code graph analysis (3)
src/server/PollingLoop.ts (2)
src/server/Logger.ts (1)
  • logger (74-74)
tests/pathfinding/playground/public/client.js (1)
  • error (769-769)
tests/PollingLoop.test.ts (1)
src/server/PollingLoop.ts (1)
  • startPolling (14-25)
src/server/Master.ts (1)
src/server/PollingLoop.ts (1)
  • startPolling (14-25)
🪛 GitHub Actions: 🧪 CI
src/server/PollingLoop.ts

[warning] 1-1: Prettier formatting issue detected. Run 'npx prettier --write src/server/PollingLoop.ts' to fix.

tests/PollingLoop.test.ts

[warning] 1-1: Prettier formatting issue detected. Run 'npx prettier --write tests/PollingLoop.test.ts' to fix.

🔇 Additional comments (4)
src/server/Master.ts (2)

14-14: LGTM!

Clean import of the new polling utility.


164-169: Good fix for the memory leak.

The recursive setTimeout pattern ensures fetchLobbies completes before scheduling the next call. This prevents the unbounded Promise accumulation that caused the ~30GB memory issue.

One note: there is no way to stop the polling loop once started. For a long-running server process this is fine, but if graceful shutdown is ever needed, consider returning a cancel function from startPolling.

tests/PollingLoop.test.ts (2)

5-11: LGTM!

Logger mock correctly matches the expected usage pattern.


22-52: Good test coverage for non-overlapping behavior.

The test correctly verifies that time advancement alone does not trigger the next call - the promise must resolve first. Nice use of controllable promises and process.nextTick for microtask flushing.

Comment thread src/server/PollingLoop.ts Outdated
Comment thread tests/PollingLoop.test.ts Outdated
Comment thread tests/PollingLoop.test.ts Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jan 13, 2026
@Codimow
Copy link
Copy Markdown
Contributor Author

Codimow commented Jan 13, 2026

@ryanbarlow97 can you review and merge this?

@ryanbarlow97
Copy link
Copy Markdown
Contributor

@ryanbarlow97 can you review and merge this?

Hmm, I'd personally want Evan to have a review - I am still getting used to the server side stuff, UI is generally more my area

I had a look and it seems okay though

@Codimow
Copy link
Copy Markdown
Contributor Author

Codimow commented Jan 13, 2026

@evanpelle

Comment thread tests/server/PollingLoop.test.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/server/PrivilegeRefresher.ts (1)

59-62: Double-logging of errors.

loadPrivilegeChecker logs the error at line 60 and then re-throws it. The startPolling utility also catches and logs errors (see PollingLoop.ts line 15-16). This means failures get logged twice.

Consider removing the throw error; since the error is already logged here, or remove the local logging and let startPolling handle it.

🔧 Option A: Remove the re-throw (let startPolling handle continuation)
     } catch (error) {
       this.log.error(`Failed to fetch cosmetics from ${this.endpoint}:`, error);
-      throw error;
     }
🔧 Option B: Remove local logging (let startPolling log it)
     } catch (error) {
-      this.log.error(`Failed to fetch cosmetics from ${this.endpoint}:`, error);
       throw error;
     }
src/server/Worker.ts (1)

44-58: Variable gm is referenced before it is defined.

The setTimeout callback at line 46 references gm, but gm is declared at line 58. This works because the callback runs after 1-3 seconds, by which time gm exists. However, this ordering is confusing and fragile.

Consider moving the setTimeout block after gm is initialized, or move gm creation earlier.

🔧 Suggested fix: Move setTimeout after gm initialization
   const gm = new GameManager(config, log);

+  setTimeout(
+    () => {
+      startMatchmakingPolling(gm);
+    },
+    1000 + Math.random() * 2000,
+  );
+
   if (config.otelEnabled()) {
     initWorkerMetrics(gm);
   }
-
-  setTimeout(
-    () => {
-      startMatchmakingPolling(gm);
-    },
-    1000 + Math.random() * 2000,
-  );
🤖 Fix all issues with AI agents
In `@src/server/Worker.ts`:
- Around line 504-508: Replace the ad-hoc console.log in the setTimeout callback
with the module logger so logs remain consistent and captured by structured
logging; locate the setTimeout(...) block where console.log(`Starting game
${gameId}`) is called (near game.start()) and change that call to use the
existing log.info (or appropriate log level) with the same message and gameId
interpolation so it matches the rest of Worker.ts logging.
🧹 Nitpick comments (1)
tests/server/PollingLoop.test.ts (1)

53-76: Consider a more robust microtask flush approach.

The double process.nextTick on lines 68-69 works but is somewhat fragile. If internal promise chaining changes, this could break. A small helper or await vi.runAllTimersAsync() pattern might be clearer.

That said, the test correctly verifies error resilience, which is important for production stability.

💡 Optional: Use a helper for microtask flushing
+  // Helper to flush all pending microtasks
+  const flushMicrotasks = () => vi.waitFor(() => Promise.resolve());

   it("should continue polling even if a task fails", async () => {
     // ...
-    await new Promise(process.nextTick);
-    await new Promise(process.nextTick);
+    await flushMicrotasks();
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f378d04 and eaf0dbe.

📒 Files selected for processing (3)
  • src/server/PrivilegeRefresher.ts
  • src/server/Worker.ts
  • tests/server/PollingLoop.test.ts
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: woodydrn
Repo: openfrontio/OpenFrontIO PR: 1836
File: src/client/Main.ts:482-482
Timestamp: 2025-08-17T20:48:49.411Z
Learning: In PR `#1836`, user woodydrn prefers to keep changes minimal and focused on the specific issue (clientID persistence) rather than refactoring redundant code in JoinLobbyEvent dispatchers. They want to avoid scope creep in focused bug fix PRs.
📚 Learning: 2026-01-12T21:37:01.156Z
Learnt from: FloPinguin
Repo: openfrontio/OpenFrontIO PR: 2874
File: src/server/MapLandTiles.ts:7-11
Timestamp: 2026-01-12T21:37:01.156Z
Learning: In this repository's OpenFrontIO deployment, inter-service HTTP calls to the master should target http://localhost:3000 (master at port 3000) as the canonical address. Apply this as the standard for all server-side TypeScript code that communicates with the master. Avoid hardcoding non-master URLs; centralize the master address (e.g., via config or env) when possible, and ensure internal service communication uses localhost:3000 in this architecture.

Applied to files:

  • src/server/PrivilegeRefresher.ts
  • src/server/Worker.ts
  • tests/server/PollingLoop.test.ts
📚 Learning: 2026-01-09T06:10:44.499Z
Learnt from: plazmaezio
Repo: openfrontio/OpenFrontIO PR: 2716
File: tests/AllianceAcceptNukes.test.ts:85-87
Timestamp: 2026-01-09T06:10:44.499Z
Learning: In tests under tests/**, objects may stack on the same tile coordinates (e.g., MissileSilo). When writing tests that involve building/placing structures or cooldowns, assume multiple items can share coordinates; adjust assertions to account for stacking, and explicitly manage or reset the stack state where necessary.

Applied to files:

  • tests/server/PollingLoop.test.ts
🧬 Code graph analysis (3)
src/server/PrivilegeRefresher.ts (1)
src/server/PollingLoop.ts (1)
  • startPolling (13-24)
src/server/Worker.ts (1)
src/server/PollingLoop.ts (1)
  • startPolling (13-24)
tests/server/PollingLoop.test.ts (1)
src/server/PollingLoop.ts (1)
  • startPolling (13-24)
🔇 Additional comments (6)
tests/server/PollingLoop.test.ts (3)

1-10: Clean test setup with proper mocking.

The import and mock setup looks good. The mock structure correctly matches how PollingLoop.ts uses the logger (logger.child(...).error(...)).


12-19: Good use of fake timers with proper cleanup.

Using vi.useFakeTimers() in beforeEach and vi.restoreAllMocks() in afterEach ensures test isolation. This is the right pattern for testing time-dependent code.


21-51: Solid test for the core serialization behavior.

This test correctly verifies the main fix: the next poll does not start until the previous one completes. The flow is clear:

  1. Start polling, verify initial call
  2. Advance time past interval, verify no new call (task still pending)
  3. Resolve task, flush microtasks, advance time
  4. Verify second call happens
src/server/PrivilegeRefresher.ts (1)

28-33: Good simplification using the new polling utility.

The refactor from manual jittered timers to startPolling makes this code cleaner and aligns with the new pattern across the codebase.

Note: The old implementation had an initial jitter delay before the first fetch. Now, loadPrivilegeChecker runs immediately on start(). This is likely fine (faster initial load), but verify this change in behavior is acceptable.

src/server/Worker.ts (2)

462-514: Good refactor to use startPolling for matchmaking.

The new startMatchmakingPolling function properly uses the startPolling utility. The internal try/catch handles errors without re-throwing, which avoids double-logging. The abort controller with timeout (line 472-473) is a nice touch for request resilience.


513-513: Random jitter in polling interval.

The interval 5000 + Math.random() * 1000 adds jitter, which is good for avoiding thundering herd problems. However, Math.random() is evaluated once when startPolling is called, so all subsequent polls use the same fixed interval.

If you want jitter on each poll cycle, the jitter logic would need to be inside the task or the polling utility would need to accept a function for interval. Current behavior is fine for most cases.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread src/server/Worker.ts Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jan 14, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/server/Worker.ts (1)

487-565: Critical: Duplicate code blocks and syntax error break compilation.

The startMatchmakingPolling function contains overlapping code blocks that appear to be a merge conflict or copy-paste error:

  1. Lines 497-511: First AbortController + fetch call
  2. Lines 512-527: Second duplicate AbortController + fetch call
  3. Lines 529-550: First response handling block
  4. Lines 553-564: Second response handling block
  5. Line 565: Incomplete statement , 5000 + Math.random() * 1000); causes parse error

The pipeline confirms this fails to compile (TS1128).

🔧 Suggested fix — clean up the function
 async function startMatchmakingPolling(gm: GameManager) {
   startPolling(async () => {
     try {
       const url = `${config.jwtIssuer() + "/matchmaking/checkin"}`;
       const gameId = generateGameIdForWorker();
       if (gameId === null) {
         log.warn(`Failed to generate game ID for worker ${workerId}`);
         return;
       }
 
       const controller = new AbortController();
       const timeoutId = setTimeout(() => controller.abort(), 20000);
       const response = await fetch(url, {
         method: "POST",
         headers: {
           "Content-Type": "application/json",
           "x-api-key": config.apiKey(),
         },
         body: JSON.stringify({
           id: workerId,
           gameId: gameId,
           ccu: gm.activeClients(),
+          instanceId: process.env.INSTANCE_ID,
         }),
         signal: controller.signal,
       });
-    const controller = new AbortController();
-    const timeoutId = setTimeout(() => controller.abort(), 20000);
-    const response = await fetch(url, {
-      method: "POST",
-      headers: {
-        "Content-Type": "application/json",
-        "x-api-key": config.apiKey(),
-      },
-      body: JSON.stringify({
-        id: workerId,
-        gameId: gameId,
-        ccu: gm.activeClients(),
-        instanceId: process.env.INSTANCE_ID,
-      }),
-      signal: controller.signal,
-    });
 
       clearTimeout(timeoutId);
 
       if (!response.ok) {
         log.warn(
           `Failed to poll lobby: ${response.status} ${response.statusText}`,
         );
         return;
       }
 
       const data = await response.json();
       log.info(`Lobby poll successful:`, data);
 
       if (data.assignment) {
         const gameConfig = await playlist.gameConfig();
         gameConfig.gameMapSize = GameMapSize.Compact;
         const game = gm.createGame(gameId, gameConfig);
         setTimeout(() => {
           // Wait a few seconds to allow clients to connect.
-          console.log(`Starting game ${gameId}`);
+          log.info(`Starting game ${gameId}`);
           game.start();
         }, 5000);
       }
     } catch (error) {
       log.error(`Error polling lobby:`, error);
-    const data = await response.json();
-    log.info(`Lobby poll successful:`, data);
-
-    if (data.assignment) {
-      const gameConfig = playlist.get1v1Config();
-      const game = gm.createGame(gameId, gameConfig);
-      setTimeout(() => {
-        // Wait a few seconds to allow clients to connect.
-        console.log(`Starting game ${gameId}`);
-        game.start();
-      }, 5000);
     }
-  }, 5000 + Math.random() * 1000);
+  }, 5000 + Math.random() * 1000);
 }

Also note: Line 557 references playlist.get1v1Config() which does not appear in the provided context. The other block uses playlist.gameConfig() — verify which method actually exists.

🤖 Fix all issues with AI agents
In `@src/server/Worker.ts`:
- Around line 45-50: The setTimeout callback calls startMatchmakingPolling(gm)
before gm is declared, causing a TDZ runtime error; fix by
declaring/initializing the gm variable before the setTimeout (move the gm
declaration above the setTimeout) or alternatively move the entire setTimeout
block to after the existing gm declaration so startMatchmakingPolling(gm) is
only invoked once gm exists; update the code around the gm variable and the
setTimeout invocation to ensure gm is in scope when passed to
startMatchmakingPolling.
♻️ Duplicate comments (1)
src/server/Worker.ts (1)

547-548: Use log.info instead of console.log.

Lines 547 and 561 use console.log while the rest of the file uses the structured log logger. This breaks consistency and may cause log messages to be missed in production monitoring.

Also applies to: 561-562

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eaf0dbe and 33cee36.

📒 Files selected for processing (2)
  • src/server/Master.ts
  • src/server/Worker.ts
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: woodydrn
Repo: openfrontio/OpenFrontIO PR: 1836
File: src/client/Main.ts:482-482
Timestamp: 2025-08-17T20:48:49.411Z
Learning: In PR `#1836`, user woodydrn prefers to keep changes minimal and focused on the specific issue (clientID persistence) rather than refactoring redundant code in JoinLobbyEvent dispatchers. They want to avoid scope creep in focused bug fix PRs.
📚 Learning: 2025-06-09T02:20:43.637Z
Learnt from: VariableVince
Repo: openfrontio/OpenFrontIO PR: 1110
File: src/client/Main.ts:293-295
Timestamp: 2025-06-09T02:20:43.637Z
Learning: In src/client/Main.ts, during game start in the handleJoinLobby callback, UI elements are hidden using direct DOM manipulation with classList.add("hidden") for consistency. This includes modals, buttons, and error divs. The codebase follows this pattern rather than using component APIs for hiding elements during game transitions.

Applied to files:

  • src/server/Master.ts
  • src/server/Worker.ts
📚 Learning: 2026-01-12T21:37:01.156Z
Learnt from: FloPinguin
Repo: openfrontio/OpenFrontIO PR: 2874
File: src/server/MapLandTiles.ts:7-11
Timestamp: 2026-01-12T21:37:01.156Z
Learning: In this repository's OpenFrontIO deployment, inter-service HTTP calls to the master should target http://localhost:3000 (master at port 3000) as the canonical address. Apply this as the standard for all server-side TypeScript code that communicates with the master. Avoid hardcoding non-master URLs; centralize the master address (e.g., via config or env) when possible, and ensure internal service communication uses localhost:3000 in this architecture.

Applied to files:

  • src/server/Master.ts
  • src/server/Worker.ts
📚 Learning: 2025-12-13T14:58:29.645Z
Learnt from: scamiv
Repo: openfrontio/OpenFrontIO PR: 2607
File: src/core/execution/PlayerExecution.ts:271-295
Timestamp: 2025-12-13T14:58:29.645Z
Learning: In src/core/execution/PlayerExecution.ts surroundedBySamePlayer(), the `as Player` cast on `mg.playerBySmallID(scan.enemyId)` is intentional. Since scan.enemyId comes from ownerID() on an owned tile and playerBySmallID() only returns Player or undefined, the cast expresses a known invariant. The maintainers prefer loud failures (runtime errors) over silent masking (early returns with guards) for corrupted game state scenarios at trusted call sites.

Applied to files:

  • src/server/Worker.ts
🧬 Code graph analysis (2)
src/server/Master.ts (1)
src/server/PollingLoop.ts (1)
  • startPolling (13-24)
src/server/Worker.ts (3)
src/server/GameManager.ts (2)
  • GameManager (15-136)
  • game (25-27)
src/server/PollingLoop.ts (1)
  • startPolling (13-24)
src/server/MapPlaylist.ts (1)
  • gameConfig (91-142)
🪛 Biome (2.1.2)
src/server/Worker.ts

[error] 510-510: This variable is used before its declaration.

The variable is declared here:

(lint/correctness/noInvalidUseBeforeDeclaration)


[error] 565-565: Expected a statement but instead found ', 5000 + Math.random() * 1000)'.

Expected a statement here.

(parse)

🪛 GitHub Actions: 🧪 CI
src/server/Worker.ts

[error] 565-565: TS1128: Declaration or statement expected.

🪛 GitHub Check: 🔍 ESLint
src/server/Worker.ts

[failure] 565-565:
Parsing error: Declaration or statement expected.

🔇 Additional comments (2)
src/server/Master.ts (2)

15-15: LGTM!

Import for startPolling is correctly added to support the new polling pattern.


180-185: Good fix for the memory leak.

The change from setInterval to startPolling correctly addresses the root cause of the memory leak. Now the next poll is scheduled only after fetchLobbies() completes (success or failure), preventing request pile-up when fetchLobbies takes longer than 100ms.

The logic is preserved: fetch lobbies, and if none exist, schedule a new public game.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread src/server/Worker.ts
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jan 14, 2026
Copy link
Copy Markdown
Collaborator

@evanpelle evanpelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@evanpelle evanpelle added this to the v29 milestone Jan 14, 2026
@evanpelle evanpelle merged commit e1d31ef into openfrontio:main Jan 14, 2026
7 of 9 checks passed
ryanbarlow97 pushed a commit that referenced this pull request Jan 14, 2026
#2869)

If this PR fixes an issue, link it below. If not, delete these two
lines.
Resolves #2868 

## Description:

This PR addresses a critical memory leak in the Master server process
(causing ~30GB RAM usage).

The issue was caused by `setInterval` calling `fetchLobbies()` every
100ms. When `fetchLobbies` took longer than 100ms to complete (due to
network latency or load), requests would pile up indefinitely, creating
a massive queue of pending Promises and open sockets.

I have refactored the polling logic into a generic `startPolling`
utility (in `src/server/PollingLoop.ts`) that uses a recursive
`setTimeout` pattern. This ensures that the next `fetchLobbies` call is
only scheduled *after* the previous one has completed (successfully or
failed), preventing any request pile-up.

## Please complete the following:

- [x] I have added screenshots for all UI updates (N/A - backend only)
- [x] I process any text displayed to the user through translateText()
and I've added it to the en.json file (N/A - no user facing text)
- [x] I have added relevant tests to the test directory
(`tests/PollingLoop.test.ts`)
- [x] I confirm I have thoroughly tested these changes and take full
responsibility for any bugs introduced

## Please put your Discord username so you can be contacted if a bug or
regression is found:

codimo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory Leak in Master Process due to Unbounded setInterval Pile-up

4 participants