Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose SslBundle information via actuator metrics #42030

Closed
seschi98 opened this issue Aug 27, 2024 · 12 comments
Closed

Expose SslBundle information via actuator metrics #42030

seschi98 opened this issue Aug 27, 2024 · 12 comments
Assignees
Labels
type: enhancement A general enhancement
Milestone

Comments

@seschi98
Copy link

I saw that support for SSL bundles was added to the actuator info and health endpoints in #41205 and I think it would be really helpful to make that information available in the metrics endpoint as well. I would like to utilize this enhancement to set up an alarm in my monitoring software so that I can renew my certificates before they expire.

I would imagine something like this:

GET http://localhost:8080/actuator/metrics/ssl.bundle.expiry

{
  "name": "ssl.bundle.expiry",
  "baseUnit": "days",
  "measurements": [
    {
      "statistic": "VALUE",
      "value": 351.0
    }
  ],
  "availableTags": [
    {
      "tag": "cert_alias",
      "values": [
        "my-alias"
      ]
    },
    {
      "tag": "bundle_name",
      "values": [
        "mybundle"
      ]
    }
  ]
}
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Aug 27, 2024
@mhalbritter
Copy link
Contributor

That's an interesting idea. @jonatan-ivanov, what do you think?

@philwebb philwebb changed the title Expose SslBundle information via actuator metrics Expose SslBundle information via actuator metrics Aug 27, 2024
@philwebb philwebb added the status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team label Aug 27, 2024
@jonatan-ivanov
Copy link
Member

I think this is a great idea and we can create a MeterBinder that registers Gauges for every certificate chain. I think we should keep this on the chain level since we can probably uniquely identify it (assumption).
The part that concerns me a bit is that the chain can contain multiple certificates, we need to somehow aggregate the "days left" value and the validity status es of all of the certs in the chain. Also we need to face with corner cases, like what if one cert in the chain is not valid yet and another already expired (and so on)?

We also need to be able to somehow "refresh" the registered Gauges since the bundle can change runtime (because of reload).

So I think this is useful but could be trickier than it seems for the first sight.
Btw we can also provide counters about the number of chains by status (valid/expired/soon-to-be-expire/etc.) so that people can track them on a graph and also create alerts (e.g.: soon to be expire should be 0).

@philwebb philwebb added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team labels Aug 28, 2024
@philwebb philwebb added this to the 3.x milestone Aug 28, 2024
@milaneuh
Copy link

milaneuh commented Sep 5, 2024

Just an assumption, but maybe we could first expose the SslBundle information on metric by using counters (ex : the second solution @jonatan-ivanov provided) on this issue. And then open another issue for the "bigger" enhancement with the MeterBinder registration ? Would allow us to provide a functionnal solution quicker and then enchancing it.

I'm not a huge open-source contributor, so please correct me if I am wrong, I'm want to learn.

@jonatan-ivanov
Copy link
Member

The two solutions I was talking about above ("days left" and "cert count by status") are not mutually exclusive, I think we should do both.

The MeterBinder component is "just" a place where you can register Meters, it makes things more structured, it does not add much to complexity. The complexity of the problem space is aggregating and re-registering Gauges for the "days left" values.
For "counting the certs by status" (which is still a Gauge since the value is non-monotonic), we don't have these problems but we still should use a MeterBinder to register them.

@milaneuh
Copy link

milaneuh commented Sep 6, 2024

Okay thanks !
I did a little deep dive inside the code base, looking at how SslInfo stores the certificate chains by the alias etc.

Correct me if I'm wrong, but, we could not aggregate all certificate for the Gauge ?
Maybe only aggregate certs that are currently valid, and excludes ones that are already expired or aren't valid yet ? Why do we need them (invalid certs) inside of the gauge ?

Couldn't we just refresh the gauge on reload ?

@wilkinsona
Copy link
Member

wilkinsona commented Sep 6, 2024

The part that concerns me a bit is that the chain can contain multiple certificates, we need to somehow aggregate the "days left" value and the validity status es of all of the certs in the chain. Also we need to face with corner cases, like what if one cert in the chain is not valid yet and another already expired (and so on)?

This feels a little bit like aggregating the health status where we use the worst case by default. For example, if one subsystem is down and everything else is up, the overall status will be down. I think a similar assume-the-worst approach makes sense here as a certificate chain is only as good as the "worst" certificate in that chain. For corner cases where there's an expired certificate and a not-yet-valid certificate, I would considered expired to be worse than not-yet-valid so the chain should be considered as having expired. My reasoning being that an expired cert cannot be fixed without someone doing something but a not-yet-valid certificate could, potentially, be fixed just by waiting.

@milaneuh
Copy link

milaneuh commented Sep 6, 2024

For reference, here's the worst case by default aggregation in the health status :

	if (containsOnlyValidCertificates(certificateChain)) {
					validCertificateChains.add(certificateChain);
				}
	else if (containsInvalidCertificate(certificateChain)) {
					invalidCertificateChains.add(certificateChain);
				}

As long as a chain contains 1 invalid certificate, the whole chain is added in the "invalidCertificateChains" list.

@mhalbritter mhalbritter self-assigned this Oct 22, 2024
@mhalbritter
Copy link
Contributor

I currently have something like this:

# HELP ssl_chain_expiry_seconds SSL chain expiry
# TYPE ssl_chain_expiry_seconds gauge
ssl_chain_expiry_seconds{bundle="ssldemo",certificate="7207ee6e",chain="spring-boot-ssl-sample"} -3.15682879E8
# HELP ssl_chains  
# TYPE ssl_chains gauge
ssl_chains{status="expired"} 1.0
ssl_chains{status="not-yet-valid"} 0.0
ssl_chains{status="valid"} 0.0
ssl_chains{status="will-expire-soon"} 0.0

ssl_chain_expiry_seconds is a TimeGauge which shows the time until expiry of the chain (the chain expires when the first certificate in it expires). The certificate is the serial number of the certificate which expires first.

ssl_chains is a Gauge which counts the chains by status. The status of a chain is the "worst" status of the contained certificate, from worst to best:

  • EXPIRED
  • NOT_YET_VALID
  • WILL_EXPIRE_SOON
  • VALID

So far this is working quite nicely. However, reload could be a bit trickier. Essentially we need to remove the gauges which track chains which no longer exist and update the existing ones.

The remove* methods on the MeterRegistry all have @Incubating on them - not sure if we should use them?

@jonatan-ivanov
Copy link
Member

Nice! Can you show me the code?
I guess we could remove @Incubating from remove, it's there for years. It can be misused with high cardinality tags but there are parts in Micrometer where we use it for similar purposes (e.g.: instrumentation for Kafka metrics).

/cc @shakuzen

@mhalbritter
Copy link
Contributor

Sure. I'll work a bit on it today, and then link the branch here.

@shakuzen
Copy link
Member

It sounds like MultiGauge could take care of removing for you.

@mhalbritter
Copy link
Contributor

mhalbritter commented Oct 23, 2024

@jonatan-ivanov: Code is here: https://github.com/mhalbritter/spring-boot/tree/mh/42030-expose-sslbundle-information-via-actuator-metrics

I've used the SampleTomcatSslApplication to test it in action: There's a background thread which fiddles with the Sslbundles after some delay.

@mhalbritter mhalbritter added the for: team-meeting An issue we'd like to discuss as a team to make progress label Dec 2, 2024
@philwebb philwebb removed the for: team-meeting An issue we'd like to discuss as a team to make progress label Dec 9, 2024
@mhalbritter mhalbritter modified the milestones: 3.x, 3.5.x Dec 19, 2024
@mhalbritter mhalbritter modified the milestones: 3.5.x, 3.5.0-M1 Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

8 participants