What if your Pods need to trust self-signed certificates?

jhugo · on June 28, 2023

Don’t bake trust stores into container images. At work we have a single place where trusted certs (commercial CAs plus our company root CA) are configured, and then those are mounted in the appropriate format (big PEM file for /etc/ssl, trust store for Java apps, etc) as a volume in deployments in all clusters.

This means that trust store updates are not dependent on the OS vendor, they’re consistent between native/Java/etc, and updates don’t require rebuilding container images.

andrewstuart2 · on June 28, 2023

I've built a small MutatingAdmissionWebhook controller [0] that handles this, via a pod annotation whose value is a secret with `ca.crt` inside, and it uses the (mostly) de facto standard openssl variables to configure the libraries, so that it works across pretty much everything I've tried it with off the shelf.

I build a bundle (though I may just move to trust-manager [1]) and replicate it into all namespaces with kubernetes-replicator [2], and then I can annotate any pod with

[0] https://github.com/microcumulus/ca-injector

[1] https://github.com/cert-manager/trust-manager

[2] https://github.com/mittwald/kubernetes-replicator

andrewstuart2 · on June 28, 2023

*...with the name of that secret and SSL works for almost everything out of the box. As I've found things that don't work (nodejs was one), I've just updated that code and it's made life pretty great for anywhere I'm running a CA in vault etc.

zrail · on June 28, 2023

This is essentially the pattern I've settled on for my homelab. I build a container image on a cron that contains all of my certificates. It has a small entrypoint script that copies the certs into a volume, and then that volume is mounted read-only into every other container that needs certs.

When certificates rotate the system builds a new image which gets pulled down by watchtower, which then in turn handles dependency management and restarts things as needed.

MereInterest · on June 28, 2023

What’s the purpose of the container image in this setup? It sounds like it exists just to make the certificate directory, so the container feels like overkill when compared to a tarball, or a self-extracting archive.

zrail · on June 28, 2023

You're right, it's a little weird. I wrote a short essay about my setup[1] but the tl;dr is that I wanted certificates distributed in the same way every other thing on my machines is distributed.

I wanted my homeprod setup to be as hands off as possible while still allowing easy management. Each physical host is running Alpine. During provisioning I install docker, Tailscale, and manually start a "root" container that runs[2] docker compose and then starts a cron daemon. The compose commands include one or more "stack" files and are generated based on a yaml file listing the stacks for each host. Watchtower runs with a 30 second cycle time to keep everything updated, including the root container. Adding or updating services means committing and pushing a change to the root container repo, then CI builds and pushes a new image. Watchtower picks up the new image and restarts the root container, which re-runs Compose which in turn starts, stops, modifies, etc anything that's changed.

For certificates, I tried a number of different things but ultimately settled on the method I described earlier. The purpose of the container image is to 1) transport the certificates and install them in the right spot and 2) be updatable automatically with Watchtower.

Certificate changes are very similar to the root container, except the git repo self-modifies upon renewals (yes I keep private keys committed to git, it's a homelab, it's really not a big deal).

[1]: https://www.petekeen.net/homeprod-management-with-docker

[2]: https://github.com/peterkeen/docker-compose-stack/blob/main/...

parlortricks · on June 28, 2023

do you have any doco for this or links to read over, i like the sound of this but too inexperienced in docker to set it up.

zrail · on June 28, 2023

Not right now. I'll write something up when I have a few minutes.

vngzs · on June 28, 2023

If you want good integrity controls, treating the CA bundles as secrets (e.g., storing them in k8s secrets or something like Vault) can help there. We don't actually care much about confidentiality controls, but any secret manager worth its salt is going to have integrity controls as well. This gives you easy updates and lets you reuse containers across environments while remaining defensive against attackers adding their own CA to the store.

tikkabhuna · on June 28, 2023

I've been thinking about doing exactly this at my place. I was thinking about Linux and Java. What other formats do we need to worry about? I'm thinking Windows?

mirekrusin · on June 28, 2023

Probably pfx.

sgtcodfish · on June 28, 2023

Plug (but it's open source and free - and mentioned in the article!): We've been trying to address this in Kubernetes with trust-manager. [1] Trust bundles need to be a runtime concern and they need to support trusting both the old a new version of a cert to safely allow for rotation. It's pretty simple but it seems to work well!

trust-manager also supports pulling in the Mozilla trust bundle which most Linux distros (and therefore most containers) use!

Handling trust of private [2] certificates is done poorly generally across many orgs and platforms, not just Kubernetes. There are lots of ways of shooting yourself in the foot - particularly when it comes to rotating CA certificates. I think there's a lot of space here for new solutions here!

[1] https://cert-manager.io/docs/projects/trust-manager/

[2] I try to avoid "self-signed" in this use case because its literal meaning is that the certificate signs itself using its own key, which is what root certificates do. The Let's Encrypt ISRG X1 root certificate is self-signed but it's definitely not what I'd call a 'private CA'; see https://letsencrypt.org/certificates/

rad_gruchalski · on June 28, 2023

Thanks for the tip re trust manager and kudos for cert-manager. Works really well.

adql · on June 28, 2023

That means you designed your architecture incorrectly; get an internal CA and automate it, then pod can just have long term root CA embedded in it.

SOLAR_FIELDS · on June 28, 2023

That’s kind of the same problem as mentioned here though, you still have to mount that root CA in all your pods.

An internal CA is also, while achievable with cert-manager these days, not exactly trivial to set up.

deathanatos · on June 28, 2023

> An internal CA is also, while achievable with cert-manager these days, not exactly trivial to set up.

…it's not exactly hard, though? Install cert-manager, and it's all of 3 resources? 1 self-signing Issuer, one Certificate (the CA cert), and then an Issuer that issues certs under that CA cert.

Then any service that wants an internal cert adds a Certificate issued by that Issuer.

SOLAR_FIELDS · on June 28, 2023

Simple if you’re already familiar with cert-manager.

The annoying part is getting it mounted in every pod that needs it and needs to consume it, which is the point the original article made. Node has its own way of doing it, Java has its own way of doing it, etc etc and they all have their own little ideosyncracies that you have to worry about.

deathanatos · on June 28, 2023

> The annoying part is getting it mounted in every pod that needs it and needs to consume it

Yes, I suppose, in that it does need to be available to the pod.

This is one of those problems you're solving regardless of Kubernetes. I think a few lines of YAML to say "pull this secret into this Pod" is pretty acceptable to trying to figure out how it ends up on a VM with Ansible, or do I pull it from SSM, etc.

> Node has its own way of doing it, Java has its own way of doing it, etc etc and they all have their own little ideosyncracies that you have to worry about.

Yes… all the world doesn't use the same HTTP/TLS libs. That's just dealing with encryption, period, though; you've got this regardless of whether the CA is internal, or you're on k8s, etc.

Java, though is particularly annoying, most of its ecosystem stubbornly refuses to use the same formats for key material as the rest of the world does, necessitating a translation into its format. That is annoying, but nonetheless that's a Java-ism.

SOLAR_FIELDS · on June 28, 2023

If you’re just running certs with a public CA you don’t have to worry about this though. It’s only if you’re rolling self-signed or an internal root CA.

I think there’s a use case to be made here for running internal services on public certs. Obviously it won’t work for every use case but if your services are low traffic I don’t really see an issue of doing that and could save end users a lot of headache as they don’t have to worry about anything and you as the operator only have to worry about automating the DNS challenge

diarrhea · on June 28, 2023

I’ve had good success with step-ca. It was easy enough to be called trivial actually!

SOLAR_FIELDS · on June 30, 2023

I actually used smallstep when I rolled my own CA as well. They have nice tooling.The most annoying part of the whole ordeal is dealing with cert formats IMO

pojzon · on June 28, 2023

Im surprised ppl dont do double tls termination with ebpf certs between pods.

Your customer sees one cert at the LB level. LB talks to ingress which also has own cert by CAM. And any other pod on the cluster uses mtls via ebpf.

SOLAR_FIELDS · on June 30, 2023

Do you have any more details on the latter half of the setup? The first half is trivial and almost every cloud provider offers that as a managed product. Does the internal one involve some sort of service mesh?

8organicbits · on June 28, 2023

I've been working on a different approach to try to make it as easy as possible to use certificates from public CAs for private networks. I think it's really hard to get all your trust stores updated with private CA certs; there's just too many of them. I've encountered devices that weren't properly configured at every company I've worked at. Any time you've got a BYOD policy, a variety of mobile devices, or engineers are spinning up their own VMs (which don't copy the trust store of the host) you're going to see certificate warning messages. Back when every computer ran Windows and was chained to the desk it was easier for IT departments to update trust stores, but the world has changed and it's a losing battle.

I think the better approach is to just use certificates from public CAs on private networks. There's some big issues to address: how does the public CA verify internal servers which don't permit external connections, is leaking internal subdomain names a security issue? But I think these are solvable.

getlocalcert.net[1] is my project to simplify the process. Currently you can register a subdomain and use the ACME DNS-01 protocol via my site to issue certificates from from providers like Let's Encrypt. All you need to issue a certificate is a getlocalcert API key and the ability to connect out-bound to the Let's Encrypt API and the getlocalcert API. It supports a couple ACME clients and it should support cert-manager, mentioned in the article, but I haven't had a chance to test and properly document it[2]. I'm not doing anything that other subdomain registrars or DNS providers can't do, but I'm trying to address the public-CA-on-private-server niche as best as possible. Longer term I'm looking to add bring-your-own-domain support and other improvements.

[1] https://www.getlocalcert.net/

[2] https://docs.getlocalcert.net/acme-clients/cert-manager/

nijave · on June 28, 2023

You can also do sidecar proxy pattern where the the sidecar handles TLS termination (incoming or outgoing). That's basically service mesh light.

You can also use MutatingWebhooks to inject initContainers or side cars to achieve that pattern.

MutatingWebhooks increase the runtime complexity since you're jamming more config in but they can handle vendor/3rd party software.

Depending on the CNI, insecure TLS flags might not actually be that bad. There are some CNIs floating around that can handle encryption. In addition, software defined networks like AWS might not offer encryption but they do guarantee messages are authenticated so you can trust the source IP

vbezhenar · on June 28, 2023

IMO that's ideal situation: when service does not encode any routing information and just talks to another service via fake DNS name. And actual routing is performed by infrastructure tools which can implement TLS, collect metrics, collect full HTTP logs if asked, implement retries, caching and so on. This is about outgoing connections and incoming connections.

It could also save on container size theoretically. Implementing TLS is a daunting task and requires many kilobytes of code. Implementing HTTP is easier task. So you can strip TLS implementation and achieve lighter containers.

adql · on June 28, 2023

You save shit all if every container comes with sidecar doing the actual TLS, it's extra component to debug and extra small latency added.

nijave · on June 29, 2023

In practice it's not that much. I think our Envoy sidecars take, maybe, 8mb of RAM when the apps are usually 500-1000MB. They usually add afaik <1ms latency. In addition, you can implement client side load balancing through Envoy so you can hit your services directly without intermediary load balancers.

It adds complexity with the side car but usually you get standardized RED metrics regardless of if the service implemented them.

You can also have the sidecar do connection pooling and maintain persistent or long lived TCP connections which ends up speeding things up without doing that at the app level.

pojzon · on June 28, 2023

Or you can leverage ebpf like cillium.

adql · on June 28, 2023

> You can also do sidecar proxy pattern where the the sidecar handles TLS termination (incoming or outgoing). That's basically service mesh light.

That's a horrid idea, why would you do that ?

otterley · on June 28, 2023

It’s an extremely common pattern. Decoupling TLS allows the developer to focus on the application and it makes for far easier development and testing.

flerchin · on June 28, 2023

In the age of cheap certs, it's not clear to me why we have these enterprises with self-signed certificates. It definitely is a thing, but it seems to be tech debt that we should fix.

adql · on June 28, 2023

Having to communicate with outside is kinda overkill if you just want to have container A talking to container B.

But the solution is internal CA, not self signed certs that defeat near-entire point of encrypting communication.

iso1631 · on June 28, 2023

I have one supplier that uses letsencrypt to generate a wildcard certificate, then distributes it as part of a software update to thousands of machines in hundreds of companies across the world. The exact same certificate and key. But it gets rid of the red X in a browser so tick

It's less secure than a self signed cert. OK you can't just sniff traffic and decode it thanks to Elliptic Curve, despite having the cert+key, but you can MITM just as much, and not throw an error.

With a self-signed if someone has accepted and cached the self-signed cert you can't MITM them without throwing an error.

flerchin · on June 28, 2023

Yes an internal CA is better than a self-signed-cert. However, I've come across, quite often, where an internal CA causes problems that are only at that enterprise. Using the established public CAs is generally more resilient, and there's _no reason to not_.

c00lio · on June 28, 2023

There are tons of reasons to not use an external CA.

First, cost. Any CA that issues unlimited certificates will charge tons of money. Free CAs like letsencrypt do have rate limits that we would frequently hit with autoscaling environments, CI jobs, and such.

Also, CAs require the use of certificate transparency logs. Which will expose your internal infrastructure data to the public. It will, by exposing autoscaling data, also expose financial data (at least in hints), e.g. by showing that last christmas, your scaling peak was far higher.

And external CAs are a security risk because you need to provide firewall exceptions and/or transfer mechanisms for certificates into your internal infrastructure that you would usually want to isolate.

Lastly, an external CA is an availability risk. Should your external CA be unreachable for some reason, you might not be able to run any CI jobs or auto-scale-up your infra.

michaelt · on June 28, 2023

There are tons of reasons to not use an internal CA.

First, cost. You're not just going to slap the root CA onto the network drive and let the developers have at it - you're going to need infrastructure to keep the key safe and handle automated cert provisioning and suchlike - that's going to need people to maintain it. And it'll be an important part of your infrastructure, so you'll need enough experts you can maintain round-the-clock support.

Second, it reduces your security because your users will inevitably learn to ignore certificate errors.

Thirdly, you'll never stop the certificate errors. Oh, you're going to set them up for both Chrome and Firefox and Edge and Safari on Windows, Mac, Linux? Oops, you forgot Android and iPhone. And your CI system. And Java and Docker and Git. And the network printer and the electronics team's network-enabled oscilloscope. Think you've covered everything? Surprise, Slack is distributed in a Snap now, it's generating certificate errors.

Just have your cloud provider take care of it. Not in the cloud? A wildcard cert on your load balancer will get you 99% of the value with 1% of the work.

electroly · on June 28, 2023

I think you've entirely misunderstood the discussion here. Your post seems to be about using internal certificates for an externally visible website (although I'm confused about the mention of network printers and Slack--are you thinking this is a MITM certificate? nobody had suggested using an internal CA for those things) but that's not what anyone here was talking about. They're talking about internal use for communication between your own backend services.

Recall the post that started this subthread:

> Having to communicate with outside is kinda overkill if you just want to have container A talking to container B.

The article here is about the same.

#2 and #3 in your post don't apply; we're not talking about browsers or end users at all here. #1 may apply but I think you're overstating it; Active Directory Certificate Services takes care of all that. Remember that you don't have to follow the CA Baseline Requirements as a private CA. It's harder to get rid of an ADCS PKI than to set it up.

michaelt · on June 28, 2023

> #2 and #3 in your post don't apply; we're not talking about browsers or end users at all here.

Y'all don't have internal tools implemented as webapps? Self-hosted version control servers? Nexus? SonarQube?

Oh, I'll agree that you can outsource all that stuff if you want to - but any business with that philosophy would surely also outsource their certificate provisioning. Especially considering how easy and cheap AWS make it.

> although I'm confused about the mention of network printers and Slack

Do you not want graceful handling of internal URLs when mentioned in slack? Such as previews, image unrolling etc? Do you not need a certificate for the internal file server your scans upload to, and so on?

electroly · on June 28, 2023

You're still talking about a completely unrelated use of CAs here. We're talking about how you get two k8s pods to communicate with each other securely, as an alternative to using self-signed certificates and without leaking details of your internal infrastructure to a CT log. Nobody suggested using self-signed certificates for any of the things you're talking about; we are talking about what you should replace your self-signed certificates with. That's what both the article and this thread are about. You're arguing against a point that nobody made. You'd never use a self-signed certificate for a user-facing website or service and nobody suggested that you would. It is specifically the situations where you'd use a self-signed certificate that this subthread is suggesting using an internal CA for instead.

Stated another way, I believe you are saying "don't use internal CAs for things you'd otherwise use public certificates for" but what we're saying is "use internal CAs for things you'd otherwise use self-signed certificates for". I believe both statements are correct but we weren't talking about the first thing at all until you brought it up.

c00lio · on June 28, 2023

So I'll route my CI integration tests that test communication from application to database through the loadbalancer? And I'll teach the loadbalancer to talk to itself to talk to the autoscaling web backends it needs to talk to because only the loadbalancer has a valid certificate? Nice brain-knot.

And talking about security risks, wildcard certs are especially dangerous and should be forbidden from ever existing. They just lead to "copy it everywhere"-keys that, sooner or later, will leak. And that won't be revoked or replaced, because of course everything will break at once.

Oh, and the certificate errors will also come with external CAs. Chain too long? Error in some browsers. ECC signature? Error in some browsers. Chain with different paths? Error in some browsers. 4096bit certificate somewhere? Error in some browsers. Two different valid roots? Error in some browsers.

jjav · on July 3, 2023

> A wildcard cert on your load balancer will get you 99% of the value with 1% of the work.

The goal is not to make cert errors go away.

The goal is to establish secure connections between parties, with the smallest possible trust boundary and attack surface.

tinus_hn · on June 28, 2023

The result is different when comparing to self-signed certificates. For instance, yes you should be careful with your key, but losing it makes your security as bad as when using self-signed certificates. Yes, you might be teaching some of your users to bypass the warnings. Just like when using self-signed certificates.

8organicbits · on June 28, 2023

> Free CAs like letsencrypt do have rate limits that we would frequently hit with autoscaling environments, CI jobs, and such.

You're expected to design around that. Deploying should never create a new certificate, those should live in secure storage and get deployed when needed.

The main rate limit also doesn't apply to renewals, so you could potentially issue N*50 domain names on your Nth week using Let's Encrypt.

"Renewals are treated specially: they don’t count against your Certificates per Registered Domain limit." - https://letsencrypt.org/docs/rate-limits/

If you need to issue certificates for new domains at a higher rate, you're very likely a large company that can afford to pay some money for any excess certificates you need. Failing over to ZeroSSL (zerossl.com) on rate limiting should be an easy engineering task since both use the ACME API.

jve · on June 28, 2023

Some more:

- Letsencrypt and friends don't give you whatever purpose certs you would want. I.e File Encryption, Code Signing and friends, individual client certs

flerchin · on June 28, 2023

ACM is effectively free. Cost is not an issue. None of your data is exposed. This is all FUD.

gruturo · on June 28, 2023

I'm sorry but this is simply not true. Certificate Transparency logs ARE (meta?)data which would be exposed. If the certificates were meant to be reached externally, you wouldn't indeed care at all - but if they're for internal flows (e.g. App server to DB, 2 steps behind a balancer and a set of web frontends) you are indeed publishing stuff you would have rather kept private.

Before this (d)evolves into a zero trust, security-by-obscurity discussion - some auditors won't certify you in some edge cases related to this, and you may be operating in a regulated sector where such a certification is necessary. Just because it doesn't impact your use case, doesn't mean this is the case for someone else.

omniglottal · on June 28, 2023

It you are not personally aware of the basis behind a security posture, please avoid denouncing it as FUD. Yout own ignorance, uncertainty, or doubt does not suffice to replace the informed advice of a professional.

madeofpalk · on June 28, 2023

Certificate Transparency Logs don't exist?

diarrhea · on June 28, 2023

- Certificate Transparency logs leak internal domain names, therefore potentially internal infrastructure or product still in stealth mode - HTTPS challenge requires opening the network to the entire internet - DNS challenge requires API access to the DNS provider. The DNS entries are the keys to the castle (MX mainly). Some names ever operators don’t offer sensible API access (none at all, or poor scoping (single token with global admin access))

czx4f4bd · on June 28, 2023

Unfortunately, enterprises are intensely risk averse. I worked at a place where we could only generate proper certs by manually submitting a ticket to IT and waiting probably days to get it back, giving us zero hope of applying meaningful automation. Getting certs from a proper CA was absolutely forbidden, despite our lobbying, so in a lot of cases self-signed certs were the only option if we wanted to automate.

geraldwhen · on June 28, 2023

Where I work it’s months, and fights, and maybe even several meetings. And that’s only if you fill out the correct form the exact right way. Any slight deviation and your ticket is closed weeks after you opened it with the field you got wrong.

mirekrusin · on June 28, 2023

Look at the bright side. If things like that worked we wouldn’t have startups.

NovemberWhiskey · on June 28, 2023

That's a hot take if ever I heard one.

There are many, many good reasons to have your own PKI that are not tech-debt related. PKI use-cases are not limited to internet-exposed server certificates.

crabbone · on June 28, 2023

From integrators' perspective, a lot of TLS stuff is often worthless, but cannot be avoided because that's how the program was written. Eg. you (an integrator) provide a service of installing and configuring Docker registry / Harbor for air-gapped systems. The intended use is to support locally deployed Kubernetes... but you have to deal with the nonsense of certificates, because removing that stuff completely from both Kubernetes and the registry is just too hard.

So, self-signed certificate is the easiest and fastest way to solve this problem.

PS. Also, custom CA certificates don't work well with browsers -- browsers like to cache them and won't let go of them easily. This has adverse effects on automated UI testing with Selenium. I.e. say, you need to modify something in your cert (usually something like a domain name) and then the browser will refuse to accept it because it remembers the old one somehow...

Another problem is that errors related to certificates are very hard to debug. The popular libraries (eg. essentially, OpenSSL) are trash when it comes to error reporting. You never get clear error messages, nor a suggestion on how to fix errors. The whole thing lives in obscurity, and CAs are even worse documented than the process of creating self-signed certificates. So, people looking to solve a problem choose a solution they can find, instead of the best solution.

8organicbits · on June 28, 2023

A reverse proxy is another good way to handle those. The proxy trusts the self-signed certificate and itself has a valid certificate from a CA the clients trust (either public or private). This way setup on the integrator's product is easy and client's aren't exposed to "hard to verify" certificates.