Skip to content

Commit 8b0083c

Browse files
Respond with useful error codes when Content-Length header/s are invalid (#19212)
Related to #17035, when Synapse receives a request that is larger than the maximum size allowed, it aborts the connection without ever sending back a HTTP response. I dug into our usage of twisted and how best to try and report such an error and this is what I came up with. It would be ideal to be able to report the status from within `handleContentChunk` but that is called too early on in the twisted http handling code, before things have been setup enough to be able to properly write a response. I tested this change out locally (both with C-S and S-S apis) and they do receive a 413 response now in addition to the connection being closed. Hopefully this will aid in being able to quickly detect when #17035 is occurring as the current situation makes it very hard to narrow things down to that specific issue without making a lot of assumptions. This PR also responds with more meaningful error codes now in the case of: - multiple `Content-Length` headers - invalid `Content-Length` header value - request content size being larger than the `Content-Length` value ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [X] Pull request is based on the develop branch * [X] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [X] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Eric Eastwood <[email protected]>
1 parent 09fd264 commit 8b0083c

File tree

10 files changed

+336
-50
lines changed

10 files changed

+336
-50
lines changed

changelog.d/19212.misc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Respond with useful error codes with `Content-Length` header/s are invalid.

synapse/api/constants.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,19 @@
2929
# the max size of a (canonical-json-encoded) event
3030
MAX_PDU_SIZE = 65536
3131

32+
# The maximum allowed size of an HTTP request.
33+
# Other than media uploads, the biggest request we expect to see is a fully-loaded
34+
# /federation/v1/send request.
35+
#
36+
# The main thing in such a request is up to 50 PDUs, and up to 100 EDUs. PDUs are
37+
# limited to 65536 bytes (possibly slightly more if the sender didn't use canonical
38+
# json encoding); there is no specced limit to EDUs (see
39+
# https://github.com/matrix-org/matrix-doc/issues/3121).
40+
#
41+
# in short, we somewhat arbitrarily limit requests to 200 * 64K (about 12.5M)
42+
#
43+
MAX_REQUEST_SIZE = 200 * MAX_PDU_SIZE
44+
3245
# Max/min size of ints in canonical JSON
3346
CANONICALJSON_MAX_INT = (2**53) - 1
3447
CANONICALJSON_MIN_INT = -CANONICALJSON_MAX_INT

synapse/app/_base.py

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@
5959
from twisted.web.resource import Resource
6060

6161
import synapse.util.caches
62-
from synapse.api.constants import MAX_PDU_SIZE
62+
from synapse.api.constants import MAX_REQUEST_SIZE
6363
from synapse.app import check_bind_error
6464
from synapse.config import ConfigError
6565
from synapse.config._base import format_config_error
@@ -895,17 +895,8 @@ def sdnotify(state: bytes) -> None:
895895
def max_request_body_size(config: HomeServerConfig) -> int:
896896
"""Get a suitable maximum size for incoming HTTP requests"""
897897

898-
# Other than media uploads, the biggest request we expect to see is a fully-loaded
899-
# /federation/v1/send request.
900-
#
901-
# The main thing in such a request is up to 50 PDUs, and up to 100 EDUs. PDUs are
902-
# limited to 65536 bytes (possibly slightly more if the sender didn't use canonical
903-
# json encoding); there is no specced limit to EDUs (see
904-
# https://github.com/matrix-org/matrix-doc/issues/3121).
905-
#
906-
# in short, we somewhat arbitrarily limit requests to 200 * 64K (about 12.5M)
907-
#
908-
max_request_size = 200 * MAX_PDU_SIZE
898+
# Baseline default for any request that isn't configured in the homeserver config
899+
max_request_size = MAX_REQUEST_SIZE
909900

910901
# if we have a media repo enabled, we may need to allow larger uploads than that
911902
if config.media.can_load_media_repo:

synapse/http/site.py

Lines changed: 139 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
#
2020
#
2121
import contextlib
22+
import json
2223
import logging
2324
import time
2425
from http import HTTPStatus
@@ -36,6 +37,7 @@
3637
from twisted.web.resource import IResource, Resource
3738
from twisted.web.server import Request
3839

40+
from synapse.api.errors import Codes, SynapseError
3941
from synapse.config.server import ListenerConfig
4042
from synapse.http import get_request_user_agent, redact_uri
4143
from synapse.http.proxy import ProxySite
@@ -59,6 +61,10 @@
5961
_next_request_seq = 0
6062

6163

64+
class ContentLengthError(SynapseError):
65+
"""Raised when content-length validation fails."""
66+
67+
6268
class SynapseRequest(Request):
6369
"""Class which encapsulates an HTTP request to synapse.
6470
@@ -144,36 +150,150 @@ def __repr__(self) -> str:
144150
self.synapse_site.site_tag,
145151
)
146152

153+
def _respond_with_error(self, synapse_error: SynapseError) -> None:
154+
"""Send an error response and close the connection."""
155+
self.setResponseCode(synapse_error.code)
156+
error_response_bytes = json.dumps(synapse_error.error_dict(None)).encode()
157+
158+
self.responseHeaders.setRawHeaders(b"Content-Type", [b"application/json"])
159+
self.responseHeaders.setRawHeaders(
160+
b"Content-Length", [f"{len(error_response_bytes)}"]
161+
)
162+
self.write(error_response_bytes)
163+
self.loseConnection()
164+
165+
def _get_content_length_from_headers(self) -> int | None:
166+
"""Attempts to obtain the `Content-Length` value from the request's headers.
167+
168+
Returns:
169+
Content length as `int` if present. Otherwise `None`.
170+
171+
Raises:
172+
ContentLengthError: if multiple `Content-Length` headers are present or the
173+
value is not an `int`.
174+
"""
175+
content_length_headers = self.requestHeaders.getRawHeaders(b"Content-Length")
176+
if content_length_headers is None:
177+
return None
178+
179+
# If there are multiple `Content-Length` headers return an error.
180+
# We don't want to even try to pick the right one if there are multiple
181+
# as we could run into problems similar to request smuggling vulnerabilities
182+
# which rely on the mismatch of how different systems interpret information.
183+
if len(content_length_headers) != 1:
184+
raise ContentLengthError(
185+
HTTPStatus.BAD_REQUEST,
186+
"Multiple Content-Length headers received",
187+
Codes.UNKNOWN,
188+
)
189+
190+
try:
191+
return int(content_length_headers[0])
192+
except (ValueError, TypeError):
193+
raise ContentLengthError(
194+
HTTPStatus.BAD_REQUEST,
195+
"Content-Length header value is not a valid integer",
196+
Codes.UNKNOWN,
197+
)
198+
199+
def _validate_content_length(self) -> None:
200+
"""Validate Content-Length header and actual content size.
201+
202+
Raises:
203+
ContentLengthError: If validation fails.
204+
"""
205+
# we should have a `content` by now.
206+
assert self.content, "_validate_content_length() called before gotLength()"
207+
content_length = self._get_content_length_from_headers()
208+
209+
if content_length is None:
210+
return
211+
212+
actual_content_length = self.content.tell()
213+
214+
if content_length > self._max_request_body_size:
215+
logger.info(
216+
"Rejecting request from %s because Content-Length %d exceeds maximum size %d: %s %s",
217+
self.client,
218+
content_length,
219+
self._max_request_body_size,
220+
self.get_method(),
221+
self.get_redacted_uri(),
222+
)
223+
raise ContentLengthError(
224+
HTTPStatus.REQUEST_ENTITY_TOO_LARGE,
225+
f"Request content is too large (>{self._max_request_body_size})",
226+
Codes.TOO_LARGE,
227+
)
228+
229+
if content_length != actual_content_length:
230+
comparison = (
231+
"smaller" if content_length < actual_content_length else "larger"
232+
)
233+
logger.info(
234+
"Rejecting request from %s because Content-Length %d is %s than the request content size %d: %s %s",
235+
self.client,
236+
content_length,
237+
comparison,
238+
actual_content_length,
239+
self.get_method(),
240+
self.get_redacted_uri(),
241+
)
242+
raise ContentLengthError(
243+
HTTPStatus.BAD_REQUEST,
244+
f"Rejecting request as the Content-Length header value {content_length} "
245+
f"is {comparison} than the actual request content size {actual_content_length}",
246+
Codes.UNKNOWN,
247+
)
248+
147249
# Twisted machinery: this method is called by the Channel once the full request has
148250
# been received, to dispatch the request to a resource.
149-
#
150-
# We're patching Twisted to bail/abort early when we see someone trying to upload
151-
# `multipart/form-data` so we can avoid Twisted parsing the entire request body into
152-
# in-memory (specific problem of this specific `Content-Type`). This protects us
153-
# from an attacker uploading something bigger than the available RAM and crashing
154-
# the server with a `MemoryError`, or carefully block just enough resources to cause
155-
# all other requests to fail.
156-
#
157-
# FIXME: This can be removed once we Twisted releases a fix and we update to a
158-
# version that is patched
159251
def requestReceived(self, command: bytes, path: bytes, version: bytes) -> None:
252+
# In the case of a Content-Length header being present, and it's value being too
253+
# large, throw a proper error to make debugging issues due to overly large requests much
254+
# easier. Currently we handle such cases in `handleContentChunk` and abort the
255+
# connection without providing a proper HTTP response.
256+
#
257+
# Attempting to write an HTTP response from within `handleContentChunk` does not
258+
# work, so the code here has been added to at least provide a response in the
259+
# case of the Content-Length header being present.
260+
self.method, self.uri = command, path
261+
self.clientproto = version
262+
263+
try:
264+
self._validate_content_length()
265+
except ContentLengthError as e:
266+
self._respond_with_error(e)
267+
return
268+
269+
# We're patching Twisted to bail/abort early when we see someone trying to upload
270+
# `multipart/form-data` so we can avoid Twisted parsing the entire request body into
271+
# in-memory (specific problem of this specific `Content-Type`). This protects us
272+
# from an attacker uploading something bigger than the available RAM and crashing
273+
# the server with a `MemoryError`, or carefully block just enough resources to cause
274+
# all other requests to fail.
275+
#
276+
# FIXME: This can be removed once Twisted releases a fix and we update to a
277+
# version that is patched
278+
# See: https://github.com/element-hq/synapse/security/advisories/GHSA-rfq8-j7rh-8hf2
160279
if command == b"POST":
161280
ctype = self.requestHeaders.getRawHeaders(b"content-type")
162281
if ctype and b"multipart/form-data" in ctype[0]:
163-
self.method, self.uri = command, path
164-
self.clientproto = version
282+
logger.warning(
283+
"Aborting connection from %s because `content-type: multipart/form-data` is unsupported: %s %s",
284+
self.client,
285+
self.get_method(),
286+
self.get_redacted_uri(),
287+
)
288+
165289
self.code = HTTPStatus.UNSUPPORTED_MEDIA_TYPE.value
166290
self.code_message = bytes(
167291
HTTPStatus.UNSUPPORTED_MEDIA_TYPE.phrase, "ascii"
168292
)
169-
self.responseHeaders.setRawHeaders(b"content-length", [b"0"])
170293

171-
logger.warning(
172-
"Aborting connection from %s because `content-type: multipart/form-data` is unsupported: %s %s",
173-
self.client,
174-
command,
175-
path,
176-
)
294+
# FIXME: Return a better error response here similar to the
295+
# `error_response_json` returned in other code paths here.
296+
self.responseHeaders.setRawHeaders(b"Content-Length", [b"0"])
177297
self.write(b"")
178298
self.loseConnection()
179299
return

tests/http/test_site.py

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
from twisted.internet.address import IPv6Address
2323
from twisted.internet.testing import MemoryReactor, StringTransport
2424

25+
from synapse.app._base import max_request_body_size
2526
from synapse.app.homeserver import SynapseHomeServer
2627
from synapse.server import HomeServer
2728
from synapse.util.clock import Clock
@@ -143,3 +144,104 @@ def test_content_type_multipart(self) -> None:
143144

144145
# we should get a 415
145146
self.assertRegex(transport.value().decode(), r"^HTTP/1\.1 415 ")
147+
148+
def test_content_length_too_large(self) -> None:
149+
"""HTTP requests with Content-Length exceeding max size should be rejected with 413"""
150+
self.hs.start_listening()
151+
152+
# find the HTTP server which is configured to listen on port 0
153+
(port, factory, _backlog, interface) = self.reactor.tcpServers[0]
154+
self.assertEqual(interface, "::")
155+
self.assertEqual(port, 0)
156+
157+
# complete the connection and wire it up to a fake transport
158+
client_address = IPv6Address("TCP", "::1", 2345)
159+
protocol = factory.buildProtocol(client_address)
160+
transport = StringTransport()
161+
protocol.makeConnection(transport)
162+
163+
# Send a request with Content-Length header that exceeds the limit.
164+
# Default max is 50MB (from media max_upload_size), so send something larger.
165+
oversized_length = 1 + max_request_body_size(self.hs.config)
166+
protocol.dataReceived(
167+
b"POST / HTTP/1.1\r\n"
168+
b"Connection: close\r\n"
169+
b"Content-Length: " + str(oversized_length).encode() + b"\r\n"
170+
b"\r\n"
171+
b"" + b"x" * oversized_length + b"\r\n"
172+
b"\r\n"
173+
)
174+
175+
# Advance the reactor to process the request
176+
while not transport.disconnecting:
177+
self.reactor.advance(1)
178+
179+
# We should get a 413 Content Too Large
180+
response = transport.value().decode()
181+
self.assertRegex(response, r"^HTTP/1\.1 413 ")
182+
self.assertSubstring("M_TOO_LARGE", response)
183+
184+
def test_too_many_content_length_headers(self) -> None:
185+
"""HTTP requests with multiple Content-Length headers should be rejected with 400"""
186+
self.hs.start_listening()
187+
188+
# find the HTTP server which is configured to listen on port 0
189+
(port, factory, _backlog, interface) = self.reactor.tcpServers[0]
190+
self.assertEqual(interface, "::")
191+
self.assertEqual(port, 0)
192+
193+
# complete the connection and wire it up to a fake transport
194+
client_address = IPv6Address("TCP", "::1", 2345)
195+
protocol = factory.buildProtocol(client_address)
196+
transport = StringTransport()
197+
protocol.makeConnection(transport)
198+
199+
protocol.dataReceived(
200+
b"POST / HTTP/1.1\r\n"
201+
b"Connection: close\r\n"
202+
b"Content-Length: " + str(5).encode() + b"\r\n"
203+
b"Content-Length: " + str(5).encode() + b"\r\n"
204+
b"\r\n"
205+
b"" + b"xxxxx" + b"\r\n"
206+
b"\r\n"
207+
)
208+
209+
# Advance the reactor to process the request
210+
while not transport.disconnecting:
211+
self.reactor.advance(1)
212+
213+
# We should get a 400
214+
response = transport.value().decode()
215+
self.assertRegex(response, r"^HTTP/1\.1 400 ")
216+
217+
def test_invalid_content_length_headers(self) -> None:
218+
"""HTTP requests with invalid Content-Length header should be rejected with 400"""
219+
self.hs.start_listening()
220+
221+
# find the HTTP server which is configured to listen on port 0
222+
(port, factory, _backlog, interface) = self.reactor.tcpServers[0]
223+
self.assertEqual(interface, "::")
224+
self.assertEqual(port, 0)
225+
226+
# complete the connection and wire it up to a fake transport
227+
client_address = IPv6Address("TCP", "::1", 2345)
228+
protocol = factory.buildProtocol(client_address)
229+
transport = StringTransport()
230+
protocol.makeConnection(transport)
231+
232+
protocol.dataReceived(
233+
b"POST / HTTP/1.1\r\n"
234+
b"Connection: close\r\n"
235+
b"Content-Length: eight\r\n"
236+
b"\r\n"
237+
b"" + b"xxxxx" + b"\r\n"
238+
b"\r\n"
239+
)
240+
241+
# Advance the reactor to process the request
242+
while not transport.disconnecting:
243+
self.reactor.advance(1)
244+
245+
# We should get a 400
246+
response = transport.value().decode()
247+
self.assertRegex(response, r"^HTTP/1\.1 400 ")

tests/rest/client/test_login.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1728,9 +1728,6 @@ def test_username_picker_use_displayname_avatar_and_email(self) -> None:
17281728
content_is_form=True,
17291729
custom_headers=[
17301730
("Cookie", "username_mapping_session=" + session_id),
1731-
# old versions of twisted don't do form-parsing without a valid
1732-
# content-length header.
1733-
("Content-Length", str(len(content))),
17341731
],
17351732
)
17361733
self.assertEqual(chan.code, 302, chan.result)
@@ -1818,9 +1815,6 @@ def test_username_picker_dont_use_displayname_avatar_or_email(self) -> None:
18181815
content_is_form=True,
18191816
custom_headers=[
18201817
("Cookie", "username_mapping_session=" + session_id),
1821-
# old versions of twisted don't do form-parsing without a valid
1822-
# content-length header.
1823-
("Content-Length", str(len(content))),
18241818
],
18251819
)
18261820
self.assertEqual(chan.code, 302, chan.result)

0 commit comments

Comments
 (0)