Skip to content

Read Response Retry Regression #12428

@moonchen

Description

@moonchen

We noticed that some requests were generating HTTP 502 errors. The cause is the origin server closed an HTTP/1 keep-alive connection. It happens as follows.

  1. An origin connection is released from the session pool.
  2. ATS buffers a request.
  3. Immediately after, ATS enters state_read_server_response_header
  4. Somewhere between steps 1 and 5, the server decides to close the connection, probably due to a keep alive time out, before the request arrives.
  5. ATS reads from the socket, and gets an EOS.6. The EOS is handled as follows
    t_state.current.retry_attempts.maximize(t_state.configured_connect_attempts_max_retries());

Note that EOS falls through, and retries are disabled from this point onwards. This behavior was introduced in PR #9366 Http2 to origin. This issue can be mitigated by reducing the keep alive timeout in ATS so it's lower than the origin's keep alive timeout.

I think ATS should provide an option to retry a request if an invalid response has been received.

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions