Skip to content

Commit 7dc1b3c

Browse files
committed
dpif-netlink: Fix probing for broken meters on Linux v5.10+.
If someone creates a lot of meters (>1017) in the kernel datapath and then re-starts ovs-vswitchd, OVS fails to probe meter support and refuses to install any meters afterwards: dpif_netlink|INFO|The kernel module has a broken meter implementation. The reason is that probing for broken meters relies on creating two meters with high meter IDs. Inside the kernel, however, the meter table is not a real hash table since v5.10 and commit: c7c4c44c9a95 ("net: openvswitch: expand the meters supported number") Instead, it's just an array with meter IDs mapped to indexes with a simple modulo operation. This array can expand, but only when it is full. There is no handling of collisions, so if the meter at ID % size is not the right one, then the lookup just fails. This is fine as long as userspace creates meters with densely packed IDs, which is the case for ovs-vswitchd... except for probing. While probing, we attempt to create meters 54545401 and 54545402. Without expanding the table size, these map onto 1017 and 1018. So, creation of these meters fails if there are already meters with IDs 1017 or 1018. At this point OVS declares meter implementation broken. Ideally, we should make the "hash" table in the kernel handle collisions and otherwise be a more or less proper hash table. But we can also improve probing in userspace and avoid this issue by choosing lower numbered meter IDs and trying to get them from the kernel first before trying to create. Choosing high values at the top of the 0-1023 range, so they are guaranteed to fit into the minimal size table in the kernel (1024). If one of these already exists and has a proper ID, then meters are likely working fine and we don't need to install new ones for probing. If these meters are not in the kernel, then we can try to create and check. This logic should work fine for older or future kernels with a proper hash table as well. There is no Fixes tag here as the check was correct at the moment it was introduced. It's the kernel change that broke it. Reported-at: openvswitch/ovs-issues#337 Acked-by: Aaron Conole <[email protected]> Signed-off-by: Ilya Maximets <[email protected]>
1 parent 500a37e commit 7dc1b3c

File tree

2 files changed

+66
-9
lines changed

2 files changed

+66
-9
lines changed

lib/dpif-netlink.c

Lines changed: 38 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4280,15 +4280,23 @@ dpif_netlink_meter_get_stats(const struct dpif *dpif_,
42804280
ARRAY_SIZE(ovs_meter_stats_policy));
42814281
if (error) {
42824282
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
4283-
VLOG_INFO_RL(&rl, "dpif_netlink_meter_transact %s failed",
4284-
command == OVS_METER_CMD_GET ? "get" : "del");
4283+
VLOG_RL(&rl, error == ENOENT ? VLL_DBG : VLL_WARN,
4284+
"dpif_netlink_meter_transact %s failed: %s",
4285+
command == OVS_METER_CMD_GET ? "get" : "del",
4286+
ovs_strerror(error));
42854287
return error;
42864288
}
42874289

4288-
if (stats
4289-
&& a[OVS_METER_ATTR_ID]
4290-
&& a[OVS_METER_ATTR_STATS]
4291-
&& nl_attr_get_u32(a[OVS_METER_ATTR_ID]) == meter_id.uint32) {
4290+
if (a[OVS_METER_ATTR_ID]
4291+
&& nl_attr_get_u32(a[OVS_METER_ATTR_ID]) != meter_id.uint32) {
4292+
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
4293+
VLOG_INFO_RL(&rl,
4294+
"Kernel returned a different meter id than requested");
4295+
ofpbuf_delete(msg);
4296+
return EINVAL;
4297+
}
4298+
4299+
if (stats && a[OVS_METER_ATTR_STATS]) {
42924300
/* return stats */
42934301
const struct ovs_flow_stats *stat;
42944302
const struct nlattr *nla;
@@ -4366,13 +4374,34 @@ probe_broken_meters__(struct dpif *dpif)
43664374
{
43674375
/* This test is destructive if a probe occurs while ovs-vswitchd is
43684376
* running (e.g., an ovs-dpctl meter command is called), so choose a
4369-
* random high meter id to make this less likely to occur. */
4370-
ofproto_meter_id id1 = { 54545401 };
4371-
ofproto_meter_id id2 = { 54545402 };
4377+
* high meter id to make this less likely to occur.
4378+
*
4379+
* In Linux kernel v5.10+ meters are stored in a table that is not
4380+
* a real hash table. It's just an array with 'meter_id % size' used
4381+
* as an index. The numbers are chosen to fit into the minimal table
4382+
* size (1024) without wrapping, so these IDs are guaranteed to be
4383+
* found under normal conditions in the meter table, if such meters
4384+
* exist. It's possible to break this check by creating some meters
4385+
* in the kernel manually with different IDs that map onto the same
4386+
* indexes, but that should not be a big problem since ovs-vswitchd
4387+
* always allocates densely packed meter IDs with an id-pool.
4388+
*
4389+
* These IDs will also work in cases where the table in the kernel is
4390+
* a proper hash table. */
4391+
ofproto_meter_id id1 = { 1021 };
4392+
ofproto_meter_id id2 = { 1022 };
43724393
struct ofputil_meter_band band = {OFPMBT13_DROP, 0, 1, 0};
43734394
struct ofputil_meter_config config1 = { 1, OFPMF13_KBPS, 1, &band};
43744395
struct ofputil_meter_config config2 = { 2, OFPMF13_KBPS, 1, &band};
43754396

4397+
/* First check if these meters are already in the kernel. If we get
4398+
* a proper response from the kernel with all the good meter IDs, then
4399+
* meters are likley supported correctly. */
4400+
if (!dpif_netlink_meter_get(dpif, id1, NULL, 0)
4401+
|| !dpif_netlink_meter_get(dpif, id2, NULL, 0)) {
4402+
return false;
4403+
}
4404+
43764405
/* Try adding two meters and make sure that they both come back with
43774406
* the proper meter id. Use the "__" version so that we don't cause
43784407
* a recurve deadlock. */

tests/system-traffic.at

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2813,6 +2813,34 @@ OVS_WAIT_UNTIL([test $(ovs-pcap p1.pcap | grep -c "${packet}") -eq 5])
28132813
OVS_TRAFFIC_VSWITCHD_STOP
28142814
AT_CLEANUP
28152815

2816+
AT_BANNER([Meters])
2817+
2818+
AT_SETUP([meters - datapath probe])
2819+
dnl Linux kernel 4.18+ should properly support meters.
2820+
OVS_CHECK_MIN_KERNEL(4, 18)
2821+
OVS_TRAFFIC_VSWITCHD_START()
2822+
2823+
dnl Create a lot of meters.
2824+
for i in $(seq 1023); do
2825+
AT_CHECK([ovs-ofctl -OOpenFlow13 add-meter br0 "meter=$i kbps band=type=drop rate=100"])
2826+
done
2827+
2828+
dnl Kill the process to avoid any potential datapath cleanups.
2829+
pid=$(cat ovs-vswitchd.pid)
2830+
AT_CHECK([kill $pid])
2831+
OVS_WAIT_WHILE([kill -0 $pid])
2832+
2833+
dnl Start it back.
2834+
AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --log-file], [0], [], [stderr])
2835+
on_exit "kill_ovs_vswitchd"
2836+
2837+
dnl It should still be possible to create meters.
2838+
AT_CHECK([ovs-ofctl -OOpenFlow13 add-meter br0 "meter=1 kbps band=type=drop rate=100"])
2839+
AT_CHECK([grep -q "broken meter implementation" ovs-vswitchd.log], [1])
2840+
2841+
OVS_TRAFFIC_VSWITCHD_STOP(['/terminating with signal/d'])
2842+
AT_CLEANUP
2843+
28162844
AT_BANNER([MPLS])
28172845

28182846
AT_SETUP([mpls - encap header dp-support])

0 commit comments

Comments
 (0)