Skip to content

ECS Backend: Infinite retry loop when container fails to start with minimum-seats-available: 1 #567

@stefanlinner

Description

@stefanlinner

Description

Thank you for the recent release and the improved logging behavior for the ECS backend!

I've encountered an issue when using the ShinyProxy ECS backend with container pre-initialization. Note that this issue was not introduced with the recent ShinyProxy 3.2.0 release - I had run into this behavior before but couldn't pinpoint the exact root cause at that time. When minimum-seats-available is set to 1 and the allocated resources (container-cpu-request/container-memory-request) are insufficient for the application to start, the ECS cluster enters an infinite retry loop attempting to spin up the failing container.

Problem Details

Expected Behavior: Failed containers should eventually stop retrying or have a reasonable backoff/failure threshold.

Actual Behavior: The ECS cluster continuously attempts to start the failing task indefinitely, even after:

  • Correcting the resource allocation in the configuration
  • Completely removing the problematic spec entry
  • Updating the ShinyProxy ECS service with "Force new deployment"

Workaround: The only solution I found was to completely destroy and recreate the infrastructure using tofu destroy and tofu apply.

Question: Is there another way to resolve this infinite retry loop without having to destroy and recreate the entire setup?

Reproduction Steps

The following minimal example should reproduce this issue:

application.yml:

proxy:
  containerBackend: ecs
  ecs:
    name: ${CLUSTER_NAME}
    region: ${AWS_REGION}
    subnets:
      - ${SUBNET_0}
      - ${SUBNET_1}
    security-groups: ${SECURITY_GROUP}
    enable-cloud-watch: true
  specs:
    - id: dummy_app
      display-name: Test App
      description: Test App
      container-cmd: ["R", "-e", "shiny::runApp()"]
      container-image: dummy_app_image
      ecs-execution-role: arn:aws:iam::app-execution-role
      container-cpu-request: 512  # Insufficient for the app requirements
      container-memory-request: 4096
      minimum-seats-available: 1  # This triggers the infinite retry behavior

app.R (Shiny application that requires >1 CPU):

library(shiny)
library(future)

# This will fail if insufficient CPU cores are available
plan(multisession, workers = 2)

ui <- fluidPage(
  titlePanel("Simple Test App"),
  h3("App started successfully!")
)

server <- function(input, output, session) {
  # Empty server - app just needs to start
}

shinyApp(ui = ui, server = server)

Root Cause

The dummy application attempts to initialize future::multisession() with 2 workers, which requires more than 1 CPU core. With only 512 CPU units allocated (0.5 cores), the container fails to start, triggering the infinite retry behavior.

Additional Question

While I have the opportunity, I'd like to ask about another behavior I've observed (not directly related to this issue):

When updating the ShinyProxy configuration and updating the ShinyProxy service with "Force new deployment", the configuration updates work correctly. However, the previous pre-initialized instances continue running and need to be stopped manually. Is there a way for tasks related to the previous ShinyProxy service to be stopped automatically during deployment updates? And more general, are there plans for a smoother config update experience with ECS something comparable to the shinyproxy operator?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions