Skip to content

Conversation

@pierugo-dfinity
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the feat label Jan 30, 2026
started: Arc<AtomicBool>,
cancelled: Arc<AtomicBool>,
init_time: Time,
latest_certified_time: Arc<RwLock<Option<Time>>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do not need to store the Time. It would suffice to store a boolean (also see fn has_replicated_all_versions_certified_before_init). Though, I thought it could bring some added debugging value to be able to observe this value in metrics and dashboard.

// certified after the replicator was started.
let latest_certified_time = Arc::new(RwLock::new(
(latest_certified_time > init_time).then_some(latest_certified_time),
));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to favor simplicity over pure performance is to keep initialize_local_store returning nothing and initializing self.latest_certified_time with Arc::new(RwLock::new(None)). Then the next call to poll will actually set self.latest_certified_time accordingly.

Writing this, I'm now actually quite convinced I'd prefer this simpler solution just described.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, especially since initialize_local_store should return early anyway on nodes that have already registered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in daea5d8

pub poll_duration: Histogram,
pub poll_count: IntCounterVec,
pub registry_version: IntGauge,
pub latest_certified_time: GenericGauge<AtomicU64>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntGauge is signed, so I preferred using an AtomicU64 to avoid conversions when setting the metric.

.inc();

return Err(OrchestratorError::UpgradeError(format!(
"Not upgrading to recalled replica version {} at regitry version {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Not upgrading to recalled replica version {} at regitry version {}",
"Not upgrading to recalled replica version {} at registry version {}",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_version: RegistryVersion,
) -> Result<RegistryVersion, String> {
let (records, _, _) = registry_canister
) -> Result<(RegistryVersion, RegistryVersion, Time), String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always find tuples where two or more fields have the same type a bit confusing. Maybe we can return a struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done in 0c2367b

// certified after the replicator was started.
let latest_certified_time = Arc::new(RwLock::new(
(latest_certified_time > init_time).then_some(latest_certified_time),
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, especially since initialize_local_store should return early anyway on nodes that have already registered.

Comment on lines +403 to +408
*latest_certified_time.write().unwrap() = maybe_certified_time;
if let Some(certified_time) = maybe_certified_time {
metrics
.latest_certified_time
.set(certified_time.as_nanos_since_unix_epoch());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe like this to avoid taking the lock if there are no changes

Suggested change
*latest_certified_time.write().unwrap() = maybe_certified_time;
if let Some(certified_time) = maybe_certified_time {
metrics
.latest_certified_time
.set(certified_time.as_nanos_since_unix_epoch());
}
if let Some(certified_time) = maybe_certified_time {
metrics
.latest_certified_time
.set(certified_time.as_nanos_since_unix_epoch());
*latest_certified_time.write().unwrap() = Some(certified_time);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are no changes

Do you mean no changes in the latest_certified_time Option, i.e. from None to None? Indeed, your suggestion could save us from taking the lock for the first few seconds/minute until getting up-to-date.

Because even if there are no changes in the canister (including the certified time), the function will return a Some so we would grab the lock anyways.

Don't you think it would actually read more easily to always update latest_certified_time? The dashboard would better reflect the actual value of the variable, in case we ever change some logic in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants