Skip to content

Improve handling of trusted certificates#6826

Open
minglumlu wants to merge 20 commits intoxapi-project:feature/trusted-certsfrom
minglumlu:private/mingl/trusted-cert
Open

Improve handling of trusted certificates#6826
minglumlu wants to merge 20 commits intoxapi-project:feature/trusted-certsfrom
minglumlu:private/mingl/trusted-cert

Conversation

@minglumlu
Copy link
Member

@minglumlu minglumlu commented Jan 9, 2026

The design is https://github.com/xapi-project/xen-api/blob/master/doc/content/design/trusted-certificates.md
(It's a little bit out-of-date. Will update it later. And remaining changes will be raised in following PRs)

Manual tests are done with a new purpose testing (change is not included in this PR) like:

xe pool-install-trusted-certificate filename=root.pem ca=true purpose=licensing,testing
xe pool-install-trusted-certificate filename=root.pem ca=true purpose=licensing,testing # will report error
xe pool-install-trusted-certificate filename=root.pem ca=true
xe pool-install-trusted-certificate filename=leaf.pem ca=false purpose=licensing,testing
xe pool-install-trusted-certificate filename=leaf.pem ca=false # will report error
xe certificate-list
xe pool-uninstall-trusted-certificate certificate-uuid=<UUID>
xe pool-certificate-sync
xe pool-crl-install filename=crl01.pem
xe pool-crl-uninstall name=crl01.pem

And pool.join and pool.eject for exchanging trusted certificates and CRLs.

The certificate stores under /etc/trusted-certs/ on coordinator and supporter hosts' dom0 filesystem.

@minglumlu minglumlu force-pushed the private/mingl/trusted-cert branch from f490581 to ac35717 Compare January 9, 2026 12:55
let certificate_purpose =
Enum
( "certificate_purpose"
, [("licensing", "Trusted certificates that are for licensing purpose.")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"that are for" could be simplified to "Trusted certificates for licensing"

let install_trusted_certificate =
call ~flags:[`Session] ~pool_internal:true ~hide_from_docs:true
~name:"install_trusted_certificate"
~doc:"Install a TLS trusted certificate on this host."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how TLS and trusted are related: is it a TLS-trusted certificate or a trusted TLS certificate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the trusted certificates used to verify peer identify when establishing TLS connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would use trusted TLS certificate: it is a TLS certificate that we trust for this purpose.


let install_trusted_certificate =
call ~name:"install_trusted_certificate"
~doc:"Install a TLS trusted certificate, pool-wide."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above for the ambiguity about how trust and TLS are related.

Xapi_stdext_threads.Threadext.Mutex.execute m (fun () ->
let (_ : string), (_ : string) =
Forkhelpers.execute_command_get_output
"/opt/xensource/bin/update-ca-bundle.sh" [cert_dir; bundle_path]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script so far does not take arguments and I don't see it being changed but maybe I missed the change. How is this supposed to work and does this mean other invocations of this script need to be updated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script seems can be replaced with https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/cert_distrib.ml#L45
I didn't find revoking from other components so far. But if there are, they should change as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is in

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This PR is changing it.

Copy link
Member

@psafont psafont Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cert_distrib does not replace update-ca-bundle.sh, they are for different purposes:
-cert_distrib distributes certificates across the pool, and update-ca-bundle creates the bundles from the distributed certificates.

Note that cert_distrib calls Helpers.update_ca_bundle, which runs update-ca-bundle.sh.

]
| Pinned _ ->
[
(0, Api_messages.pool_ca_certificate_expired)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the alerts about the pinned certificates reuse the same message as the CA ones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new messages for pool_pinned_certificate_...

all_certificates
|> List.filter (fun (cert, record) ->
record.API.certificate_type <> `ca
&& record.API.certificate_type <> `pinned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now tempting to change the condition to match host or host_internal certificates instead of the ones that are not that..

)
; ( "pool-install-trusted-certificate"
, {
reqd= ["filename"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if ca was a required parameter, there's nothing indicating what is the default mode to the user.

Comment on lines 1933 to 1939
Option.bind (List.assoc_opt "purpose" params) (fun ss ->
String.split_on_char ',' ss
|> List.map Record_util.certificate_purpose_of_string
|> Option.some
)
|> Option.to_list
|> List.flatten
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the parameter allows more than one values, separated by comas, it should be documented in cli_frontend so users can see it.

Code here also looks more complex than necessary:

Suggested change
Option.bind (List.assoc_opt "purpose" params) (fun ss ->
String.split_on_char ',' ss
|> List.map Record_util.certificate_purpose_of_string
|> Option.some
)
|> Option.to_list
|> List.flatten
List.assoc_opt "purpose" params
|> Option.map (fun ss ->
String.split_on_char ',' ss
|> List.map Record_util.certificate_purpose_of_string
)
|> function Some purposes -> purposes | None -> []

let not_allowed_when_ntp_is_enabled =
add_error "NOT_ALLOWED_WHEN_NTP_IS_ENABLED"

let not_trusted_certificate = add_error "NOT_TRUSTED_CERTIFICTE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let not_trusted_certificate = add_error "NOT_TRUSTED_CERTIFICTE"
let not_trusted_certificate = add_error "NOT_TRUSTED_CERTIFICATE"

* Note that the bundles (e) and (f) are generated automatically using the contents of (c) and (d) respectively *)

type t_trusted = CA_Certificate | CRL
type t_trusted = CA_Certificate | CRL | Root_CA | Leaf_Pinned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear the difference between CA_Certificate and Root_CA just by looking at the names. Since be don't need to maintain the names here (it's not exposed in the API) we can change the names to:

type t_trusted = Root | Peer | Root_legacy | CRL

The comment before the type should also be updated with the new directories

Also the types in the functions further down seem to imply that the legacy certs are never linked with a purpose, but the new ones might be, I recommend using types to enforce this:

module Trusted = sig
  type t =  Root of string list | Peer of string list | Root_legacy | CRL
end

This way the functions lose the purpose parameter while allowing existing ones to disregard the purpose if they don't need it:

let library_path = function
  | Root_legacy ->
      !Xapi_globs.trusted_certs_dir
  | CRL ->
      Stunnel.crl_path
  | Root _ | Peer _ ->
      !Xapi_globs.trusted_certs_by_purpose_dir


let all_trusted_kinds =
  let all_purposes =
    [] :: List.map (fun x -> [x]) API.certificate_purpose__all
  in
  List.concat [
    List.map (fun p -> Root p) all_purposes
  ; List.map (fun p -> Peer p) all_purposes
  ; [Root_legacy; CRL]
  ]

let trusted_cert_dirs kind =
  let parent = library_path kind in
  match kind with
  | CA_Certificate | CRL ->
      [parent]
  | Root_CA p ->
      [parent // ps "ca-%s" (string_of_purpose p)]
  | Leaf_Pinned p ->
      [parent // ps "pinned-%s" (string_of_purpose p)]

let trusted_bundle_location kind = ...

let with_cert_store kind f = ...

let rehash () =
  let ( let* ) l f = List.iter f l in
  let* kind = all_trusted_kinds in
  with_cert_store kind
  @@ fun ~cert_dir ~bundle_dir:_ ~bundle_name:_ ->
  mkdir_cert_path kind ;
  rehash' cert_dir

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea!

; ( "host_internal"
, "Certificate that identifies a single host to other pool members"
)
; ("pinned", "Pinned leaf certificate that is trusted by the whole pool")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we used a name that signifies the purpose of the certificate rather than the method used to trust it. This means using peer. One way to think about it is that a peer certificate is going to be pinned to a service with a particular purpose.

Copy link
Member Author

@minglumlu minglumlu Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusion of using peer comes from the self-signed certificate. It can be either a root CA, or a leaf certificate for certificate pinning. We hope when a user or a client manipulates the certificates, it's clear for them to know the difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusion of using peer comes from the self-signed certificate. It can be either a root CA, or a leaf certificate for certificate pinning.

I don't follow the reasoning, I understand that a self-signed certificate can be used in many roles, but the enum makes clear in which role it's being used, as a ca certificate, or a peer certificate. Maybe I don't see the confusion because a peer certificate for me is the certificate that identifies the other peer in a TLS handshake, and is completely separate from the certificate chain, the latter includes the root or CA certificates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did lead to confusion with several people involved. If "peer" generally means just the certificate of the remote entity, then it does not clearly imply (in this type field) how to use it in TLS verification, that is verify the chain and hostname or just the "pinned" certificate. If we call it pinned, then it is unambiguous.

Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, github is asking me add a global comment instead of allowing me to simply comment on pieces of code

Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
@minglumlu minglumlu force-pushed the private/mingl/trusted-cert branch from dde36b8 to b2124dc Compare January 19, 2026 08:59
@minglumlu minglumlu marked this pull request as ready for review January 19, 2026 09:08
Comment on lines 513 to 521
|> List.map (fun kind ->
mkdir_cert_path kind ;
trusted_store_locations kind
|> List.map (fun store ->
list_dir store.cert_dir |> List.map (fun n -> (n, store.purpose))
)
|> List.flatten
)
|> List.flatten
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|> List.map (fun kind ->
mkdir_cert_path kind ;
trusted_store_locations kind
|> List.map (fun store ->
list_dir store.cert_dir |> List.map (fun n -> (n, store.purpose))
)
|> List.flatten
)
|> List.flatten
|> List.flat_map (fun kind ->
mkdir_cert_path kind ;
trusted_store_locations kind
|> List.flat_map (fun store ->
list_dir store.cert_dir |> List.map (fun n -> (n, store.purpose))
)
)

@psafont
Copy link
Member

psafont commented Jan 19, 2026

cert_distrib hasn't been changed in this PR, it's the module that handles copying (CA) certificates on pool join. Has it been tested?

In general, I ddidn't expect this amount of code being added to cope with the new certificates. Probably cert_distrib should be used instead, by adding there a couple of new functions to install CA certificates across the pool.

@minglumlu
Copy link
Member Author

minglumlu commented Jan 20, 2026

cert_distrib hasn't been changed in this PR, it's the module that handles copying (CA) certificates on pool join. Has it been tested?

In general, I ddidn't expect this amount of code being added to cope with the new certificates. Probably cert_distrib should be used instead, by adding there a couple of new functions to install CA certificates across the pool.

exchange will be in next PR. Are you suggesting to replace the host_install and pool-sync with cert-distrib?


Well, just found there are lots of duplications between certificates.ml and cert_distrib.ml... ...
And cert_distrib.ml depends on certificates.ml.

@psafont
Copy link
Member

psafont commented Jan 20, 2026

I simply don't understand why is so much code added to list certificates from the filesystem, with even API calls. It doesn't help at all that the commit message contains no reasoning as to why it's needed.

Normally to list certificates, only the database would be needed because the certificate lifecycle is controlled using the API. SO why is all this filesystem access needed in this case?

exchange will be in next PR. Are you suggesting to replace the host_install and pool-sync with cert-distrib?

I wanted to do this when we introduced cert_distrib, but never had the time to do it. Seeing that it needs to be modified anyway, I'd apreciate if certificates is only used for local operatons, and cert_distrib to distribute certificates.

@minglumlu
Copy link
Member Author

Normally to list certificates, only the database would be needed because the certificate lifecycle is controlled using the API. SO why is all this filesystem access needed in this case?

The data retrieved from database is the primary source of certificates.
The data retrieved from filesystem is the reality of the certificates on individual hosts, which may be corrupted.

When doing pool_sync, the logic is:

  1. remove the remnants which are not in database
  2. add new ones which are not in filesystem

This is done duplicatedly in both certificates.ml and cert_distrib.ml but with different approaches:

  1. in certificates.ml, database and filesystem are compared for individual certificates.
  2. in cert_distrib.ml, with Erase_old, the whole cert directory will be removed and recreated to match with the database.

Now I understand they need to be consolidated as cert_distrib only.

I wanted to do this when we introduced cert_distrib, but never had the time to do it. Seeing that it needs to be modified anyway, I'd apreciate if certificates is only used for local operatons, and cert_distrib to distribute certificates.

I didn't look into cert_distrib thoroughly before. Now I can eliminate the duplications.

@minglumlu minglumlu force-pushed the private/mingl/trusted-cert branch from b2124dc to ab810b7 Compare January 21, 2026 11:03
@minglumlu
Copy link
Member Author

Pushed the changes based on @psafont's comments. Not tested yet but I pushed them as I would like to see if it's on the right track. Will test tomorrrow if it is.

Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes, I think there's all the trusted certificates should be exchanged on pool join, currently they arent. Other than that it looks good

(diff |> StringSet.elements |> String.concat "; ") ;
not in_sync_with_remote
in
let ( let*? ) l f = List.map f l |> List.exists Fun.id in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this simply let ( let*? ) l f = List.exists f l in ?

I don't see a need to evaluate all the list in the users: they don't have visible side-effects and it seems safe to short-circuit.

Worker.local_write_cert_fs ~__context HostPoolCert Merge pool_certs ;
Worker.local_regen_bundle ~__context

let collect_ca_certs ~__context ~names =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is missing the exchange of CRL, RootCert and PinnedCert. This will make sure all the certificates are copied on pool join. Maybe it's worth changing the function to collect_trusted_certificates, and similar with the following function.

@minglumlu
Copy link
Member Author

Thanks for making the changes, I think there's all the trusted certificates should be exchanged on pool join, currently they arent. Other than that it looks good

Sure. Will append some commits for these missing pieces with the current approach. Thanks.

@minglumlu minglumlu force-pushed the private/mingl/trusted-cert branch 2 times, most recently from cbeab03 to fff055f Compare January 26, 2026 06:22
@minglumlu
Copy link
Member Author

Hi @lindig @psafont @robhoes
I think all missing parts have been done and I've tested the changes manually. Could you please look at the changes again? Thanks!

@minglumlu
Copy link
Member Author

minglumlu commented Jan 26, 2026

No. The API can work. But the SDK is complaining it. Probably due to the Map (String, Set String), which @robhoes raised a warning last week. (I moved some data out of the interface data structure already)
It would be annoying that in exchange on join, we would have to install the certificates of the host into the pool one by one, although usually there will not be too many certificates on the host.


Not too bad. import is a keyword in go.

@minglumlu minglumlu force-pushed the private/mingl/trusted-cert branch from fff055f to 8cb0d7c Compare January 26, 2026 08:41
Copy link
Contributor

@lindig lindig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rely on @psafont to make sure we have the changes he suggests.

@robhoes
Copy link
Member

robhoes commented Jan 28, 2026

@minglumlu Could you squash the fixups please?

|> List.map WireProtocol.pair_of_certificate_file

(* This function is called on the pool that is incorporating a new host *)
let exchange_ca_certificates_with_joiner ~__context ~import ~export =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised the exchange of trusted certificates and CRLs between the joiner and the pool is not done here, and instead is done in xapi_pool. Are you sure it's working correctly? I would have renamed this function (exchange_ca_certificates_with_joiner) and import_joining_pool_ca_certificates to add the exchange of trusted certificates and CRLs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't merge these 2 (actually 3) functions as the type of import argument of their corresponding API functions are quite different. I would have to merge them as well if using a single Cert_distrib.exchange_ca_certificates_with_joiner. And I found the logic of Pool.exchange_trusted_certificates_on_join and Pool.exchange_crls_on_join seems very simple and different with Pool.exchange_ca_certificates_on_join also. For example, for merging logic,

  • legacy ca: filename is the primary info used for merging; but database is still needed to update;
  • trusted: purpose and fingerprint are the primary info used for merging;
  • crl: filename is the primary info used for merging, but no database at all.

So I didn't change the existing Cert_distrib.exchange_ca_certificates_with_joiner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But let me have a try to see how it will look like ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Pau, I attempted to merge these functions, but the differences among them are too extensive to get them merged effectively.
The current work flow is like below, so it works.

[host]                                                       [pool]
exchange_trusted_certificates
-> Client.Pool.exchange_trusted_certificates_on_join   -->   exchange_trusted_certificates_on_join
                                                             -> install_trusted_certificate
                                                                -> Certificates.host_install
                                                                -> Cert_distrib.copy_certs_to_all
                                                             -> Cert_distrib.collect_trusted_certs
                                                       <--
-> install_trusted_certificate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, can you explain this modification in the design doc? This changes the previous design regarding certificate exchanges (https://xapi-project.github.io/new-docs/design/pool-certificates/#u2-pool-join)

I need to check what's wrong with the mermaid thing, it used to work :/

Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
@minglumlu minglumlu force-pushed the private/mingl/trusted-cert branch from 450c76f to a6e390e Compare January 29, 2026 07:07
Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great thank, only one comment that should be easy to resolve

design_doc: true
revision: 2
status: released (22.6.0)
status: draft (update based on release 22.6.0 revision 2)
Copy link
Member

@psafont psafont Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather indicate when this was changed:

status: released (22.6.0), modified (26.2.0)

Or if we want to follow Rob's suggestion, and the one which I prefer, keep this as it is because it's a historic document; add the new diagram and other changes to trusted-certificates.md, and state here that another suggestion updates it:

status: released (22.6.0), modified by [trusted-certificates](trusted-certificates.md)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I updated in the new design trusted-certificates.md and placed a reference in the base design pool-certificates.md.

Signed-off-by: Ming Lu <ming.lu@cloud.com>
@minglumlu minglumlu force-pushed the private/mingl/trusted-cert branch from a11e52f to 36890b6 Compare February 4, 2026 07:33
@psafont
Copy link
Member

psafont commented Feb 4, 2026

the failing check can be ignored, it happens because ocaml 5.3.0 was removed from xs-opam for 5.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants