Skip to content

Commit 989a7df

Browse files
committed
rebalance: introduce go-carbon health check with better sync rate control
The original sync mechanism is a bit too simple, as it is controllyed by only workers. It is hard to balance both efficiency and reliability. Setting the worker count too low, rebalancing becomes too slow; setting it too high, buckyd might take away too much resources. To meet both criteria, this changes introduced a go-carbon health check and a automatic sync rate adjustments based on metrics per second per node. bucky rebalance -f -offload -ignore404 \ -graphite-ip-to-hostname \ -graphite-metrics-prefix carbon.minutely.buckytool.rebalance.$cluster.dst.$to_location.src.$from_location \ -graphite-endpoint 127.0.0.1:3002 \ -go-carbon-health-check=$enable_go_carbon_health_check \ -go-carbon-health-check-interval 5 \ -go-carbon-port 8080 \ -go-carbon-protocol http \ -go-carbon-cache-threshold 0.5 \ -sync-speed-up-interval $sync_speed_up_interval \ -metrics-per-second $sync_metrics_per_second \ -h ${seed_node}:4242 \ -workers $workers \ -timeout $timeout Using the above example for explanation: -go-carbon-health-check=true: asks buckytools to check go-carbon cache usage, which indicates if there is capacity issues. -sync-speed-up-interval: means increased sync rate (metrics per second) every specified seconds. -metrics-per-second: means the initial sync rate. Should be set to a lower value like 5 - 10 initially while using -sync-speed-up-interval. Whats more, with go-carbon health check, buckytools also does automatic sync rate easing. Could be disabled by `-no-random-easing=false`. Full flags could be found with `bucky rebalance -help`. Besides go-carbon health check, this commits also introduce exporting some internal sync rate metrics to graphite for monitoring. Could be enabled by -graphite-metrics-prefix. By default -go-carbon-health-check is disabled, which means buckytools fallback to the original sync behaviour. Relevant go-carbon PR: go-graphite/go-carbon#433
1 parent ee866ed commit 989a7df

File tree

7 files changed

+715
-66
lines changed

7 files changed

+715
-66
lines changed

Makefile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,15 @@ test: clean bucky buckyd
3737
go run -mod vendor testing/rebalance.go $(REBALANCE_FLAGS)
3838
go run -mod vendor testing/copy.go $(COPY_FLAGS)
3939

40+
# only works on linux
41+
test_rebalance_health_check_setup:
42+
sudo ip addr add 10.0.1.7 dev lo
43+
sudo ip addr add 10.0.1.8 dev lo
44+
sudo ip addr add 10.0.1.9 dev lo
45+
46+
test_rebalance_health_check: clean bucky buckyd
47+
# go run -mod vendor testing/rebalance_health_check.go $(REBALANCE_FLAGS)
48+
4049
clean_test:
4150
rm -rf testdata_rebalance_*
4251
rm -rf testdata_copy_*

cmd/bucky/common.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,9 @@ func DeleteMetric(server, metric string) error {
157157

158158
switch resp.StatusCode {
159159
case 200:
160-
log.Printf("DELETED: %s", metric)
160+
if msFlags.printDeletedMetrics {
161+
log.Printf("DELETED: %s", metric)
162+
}
161163
case 404:
162164
log.Printf("Not found / Not deleted: %s", metric)
163165
return fmt.Errorf("Metric not found.")

0 commit comments

Comments
 (0)