20 min to read
TLS Automation Isn't Optional Anymore. Here's What That Means for Your Stack.
Certificate Automation
Why manual TLS management is quietly breaking the web — and what the industry is doing about it.
Before We Start: What Problem Are We Actually Solving?
Picture this: it’s 2 AM. Your company’s e-commerce site starts throwing NET::ERR_CERT_DATE_INVALID errors in every browser. Customer support is flooded. Revenue is bleeding per minute. Your on-call engineer digs in, only to realize that the TLS certificate — the thing that keeps your site on HTTPS — quietly expired at midnight, and nobody set up an alert for it.

This isn’t a hypothetical. It happens to companies of every size, regularly. And the fix — going through your CA (Certificate Authority) portal, generating a new CSR, submitting it, downloading the cert, deploying it — can easily take hours if it’s not automated.
This is exactly the problem the Chrome Security team has been hammering on, and in October 2023, they made it official policy: Certificate Authorities that want to be in Chrome’s Root Store must support automated certificate issuance. Let’s dig into why this matters more than it might first seem.
Part 1: TLS in Plain Terms
TLS — Transport Layer Security, formerly known as SSL — is the security protocol that puts the “S” in HTTPS. It’s the underlying technology that establishes a secure channel between a web browser and the web server hosting the website a user is browsing.
When you open https://yourbankingapp.com, here’s what actually happens before you see a single pixel:
- Your browser connects to the server and says “here are the TLS versions and cipher suites I support.”
- The server responds with its TLS certificate and its preferred cipher suite.
- Your browser verifies the certificate against a list of trusted Certificate Authorities built into your OS/browser (the Root Store).
- A session key is exchanged using asymmetric cryptography (the certificate’s public key), and from there, everything is encrypted symmetrically.
This whole dance is called the TLS Handshake, and it takes milliseconds but carries enormous security implications.

The Three Pillars TLS Actually Guarantees
TLS provides three core security properties:
- Encryption: Ensures data being transmitted can’t be intercepted and understood by third parties.
- Authentication: Ensures the web server or application a web browser is connecting to is who it claims to be.
- Integrity: Ensures data has not been altered while in transit.
One thing people frequently get wrong: using TLS ensures web traffic is encrypted while in transit to or from the corresponding web server; it does not guarantee the safety or security of that content. TLS does not prevent phishing or malicious content like malware or viruses from being served to a website’s users. A phishing site can absolutely have a valid TLS certificate — it just means your connection to that phishing site is encrypted. This is why Chrome removed the lock icon in version 117, replacing it with a more neutral “tune” icon to reduce the misconception that a padlock means “safe.”
X.509 Certificates: The Identity Documents of the Web
X.509 certificates — sometimes referred to as “TLS certificates” or “server authentication certificates” — are an essential part of the TLS Handshake. They are issued by trusted entities called Certification Authorities (CAs) and are responsible for verifying and subsequently binding a domain name (e.g., google.com) with a corresponding public key.
Think of a TLS certificate as a notarized ID for your web server. The CA is the notary — a third party both you and the browser trust. When the CA signs your certificate, they’re essentially vouching: “Yes, we verified this is really google.com.”
Part 2: Certificate Validity — Why Shorter Is Actually Better
One key requirement is that a certificate’s maximum validity is no more than 398 days. But it didn’t start that way.
In just over ten years, the ecosystem has trended from unlimited certificate lifetime to 60 months (2012), to 39 months (2015), to 825 days (2018), to 398 days (2020).
Each of these reductions was controversial. Enterprise teams complained about the operational overhead of renewing certificates more frequently. But the security reasoning is sound: shorter validity windows limit blast radius.
Here’s an analogy. Imagine you use one password across all your accounts and it gets breached. If it’s a password you haven’t changed in five years, an attacker has potentially had access for five years before you notice. Now imagine if you rotated it every year — or every 90 days — via a password manager. Even if it’s breached, the window of exploitation is narrow.
Shortening certificate lifetimes protects users by reducing the impact of compromised certificate keys and by speeding up the replacement of insecure technologies and practices across the web.
Part 3: The Manual Management Problem — It Doesn’t Scale
Here’s the ugly reality: most teams managing more than a handful of certificates are doing it in a spreadsheet. Maybe it’s a shared Google Sheet or a Jira ticket with a “reminder” label. It works until it doesn’t.
The decreasing lifetime of certificates and the increasing number of certificates that organizations rely on have created a growing need for website operators to become more agile in managing certificates and corresponding infrastructure.
Let’s quantify how bad the failure rate is:
- When considering failed TLS connections observed in Chrome versions released within the last year (i.e., Chrome 106 and greater) on all platforms, over 22% of these resulted from certificates with an invalid validity date.
- A 2019 study found that 3.9% of all HTTPS sites have expired certificates.
22% of TLS failures are just… expired certificates. That’s an operational failure, not a security one. And it’s entirely preventable.
Real world impact: In 2020, Microsoft Teams went down globally for roughly three hours because of an expired certificate. Millions of remote workers — already stressed by the early pandemic environment — couldn’t access their primary work communication tool. The root cause was a certificate that wasn’t caught before expiry. Microsoft, a company with some of the world’s best SRE talent, got bitten by this.
Part 4: Certificate Automation — What It Actually Means
While there isn’t a one-size-fits-all definition of certificate automation, there is one shared element: the requirement for “hands-on” input from humans during initial certificate issuance and ongoing renewal is minimized or eliminated. Certificate automation simplifies the often complex and error-prone tasks associated with managing certificates, enhancing security and operational efficiency.
In the Web Public Key Infrastructure (Web PKI), there are two major categories of certificate automation solutions: open solutions relying on standards such as the Automatic Certificate Management Environment (ACME) protocol and solutions often relying on proprietary tools or protocols.
ACME: The Protocol That Changed Everything
ACME (RFC 8555) was pioneered by Let’s Encrypt and is now a standard. Here’s how it works at a high level:
- Your server runs an ACME client (like Certbot, acme.sh, or Lego).
- The client contacts the CA and says “I want a certificate for
example.com.” - The CA issues a challenge — typically an HTTP-01 challenge (place this specific file at this URL) or a DNS-01 challenge (add this TXT record to your DNS).
- The CA verifies the challenge, confirming you control the domain.
- The CA issues the certificate — the whole thing is done in seconds.
- The client is scheduled (via cron or a system daemon) to renew before expiry, fully automatically.
This isn’t magic — it’s just well-designed software. But the operational difference is enormous. A team managing 1,000 certificates manually versus via ACME isn’t just saving time — they’re eliminating an entire class of incident.
The Numbers Behind ACME Adoption
Survey data from Chrome’s Root Store CA owners, coupled with publicly available Certificate Transparency log data, estimated that 58% of the certificates issued by the Web PKI today rely on the ACME protocol. The set of CA owners that offer ACME support today and are included in the Chrome Root Store represent more than 95% of Web PKI’s certificate population.
70% of those corresponding CA owners self-reported increasing demand for ACME services, which Chrome’s team interpreted as a strong indicator of a healthy and growing ACME user population across the ecosystem.
More than 80% of the certificates issued by the Web PKI today are issued using some form of automation (which includes ACME).
Part 5: Three Case Studies That Made the Industry Wake Up
Case Study 1 — The Heartbleed Crisis (2014)
This is arguably the scariest web security event in recent memory. In April 2014, a security vulnerability known as “Heartbleed” was discovered in a popular cryptographic software library used to secure the majority of servers on the Internet that broke the security properties provided by TLS.
The scale was staggering: it was estimated that over 500,000 active publicly accessible server authentication certificates needed to be revoked and replaced.
But here’s the devastating part — the remediation was embarrassingly slow:
- Only 14% of affected websites completed the necessary remediation steps within a month of disclosure.
- About 33% of affected devices remained vulnerable nearly three years after disclosure.
Why so slow? The maximum certificate validity permitted by the Baseline Requirements at the time was five years. For some website operators, this meant the need to revisit the state of their TLS configuration was incorrectly assumed to be years away — which partly explains the observed remediation inaction.
The financial toll on CAs was also brutal: CAs who elected to revoke certificates faced significant costs related to hosting revocation information — estimated for one CA to be between $400,000 and $952,992.40 USD per month. Because the Baseline Requirements obligate CAs to host revocation information for the full validity of each certificate they issue, this could stretch for years.
The peer-reviewed research angle is particularly telling: research demonstrates that in response to the manual intervention necessitated by Heartbleed, system administrators who implemented automation were more prompt in performing certificate replacements when compared to those who did not.
Automation wasn’t just operationally convenient — it was the difference between being secure and remaining exposed.
Case Study 2 — Let’s Encrypt’s 3 Million Cert Bug (2020)
This one is instructive because it shows both failure and recovery. Four years ago, Let’s Encrypt self-reported a bug that affected over 3 million certificates. In response to the incident, nearly 2 million certificates were revoked, meaning website operators needed to intervene and trigger replacement to avoid a potential outage.
That’s a nightmare at scale. But here’s what’s remarkable:
More than 1.7 million affected certificates were replaced in less than 48 hours, largely because the ACME protocol pioneered by and relied on by Let’s Encrypt allowed affected website operators to recover from the incident with limited manual effort. The incident also resulted in Let’s Encrypt’s commitment to developing and deploying a new protocol — ACME Renewal Information (ARI) — capable of improving response to future CA incidents such that certificate replacement can occur automatically without human intervention.
Let’s Encrypt announced a production deployment of ARI in March 2023. Google Trust Services also announced their production deployment of ARI in May 2023.
ARI (ACME Renewal Information) is genuinely clever. It allows a CA to proactively tell your ACME client: “Hey, renew this certificate between these specific dates” — even if the certificate isn’t near expiry. In a mass revocation scenario, instead of scrambling to notify operators, the CA can just push renewal windows to all affected clients and let the automation handle it.
Case Study 3 — SHA-1 Deprecation (2014–2017)
Cryptographic hash functions don’t stay secure forever. In 2005, researchers demonstrated the first weaknesses in the widely used SHA-1 hash function. In response, in 2014, Chrome announced a deprecation timeline, with the CA/Browser Forum ultimately prohibiting the issuance of certificates that used SHA-1 after January 1, 2016.
Unfortunately, this deprecation took years. Browsers had to wait for almost all affected certificates to be renewed — many of them manually — to avoid mass breakage.
The near-miss is jaw-dropping: in February 2017, researchers demonstrated a devastating vulnerability in SHA-1, barely avoiding a crisis because Chrome had finished removing support for affected certificates just weeks before.
Weeks. The security community essentially got lucky. Had SHA-1 certificates still been broadly in use, the attack would have been catastrophically practical.
Part 6: The Business Case for Automation (For Skeptical CTOs)
The security arguments are compelling, but let’s talk in terms that get budget approved.
Operational Efficiency
Automation reduces the time and resources required to manage certificates manually. Though there is an initial investment to automate, over time, team members have increased availability to focus on more strategic, value-adding activities.
For an organization with 50 certificates, manual management is annoying. At 500 certificates, it becomes a part-time job. At 5,000 certificates (think: CDN endpoints, microservices, staging environments, subdomains), it’s a full team.
Resilience Against Incidents
Automation coupled with monitoring protects against website outages due to certificate expiration that could result in a loss of traffic, reputation, or revenue.
The downtime math is simple: if your site generates $50,000/hour in revenue and goes down for three hours because of an expired certificate, you’ve lost $150,000. Automation tooling for certificate management costs a fraction of that.
Agility for Cryptographic Transitions
Automation increases the speed at which the benefits of new security capabilities are realized.
When the next SHA-1 moment happens — and cryptographers widely agree it’s a matter of when, not if, for various currently-trusted algorithms — organizations with automation will be able to respond in hours. Those without will be scrambling for months.
Part 7: Shorter Certificates — Where the Industry Is Heading
Facebook’s approach is worth studying. Facebook has implemented a highly automated certificate issuance and management workflow to protect its network edge and corresponding devices with certificates that are used for just a few days.
Days. Not 398. Not 90. Days.
For Facebook’s internal infrastructure, this makes sense: they control both the CA and the servers, so they can run a fully automated pipeline that issues certificates with 3–7 day validity. The attack window for a compromised key becomes almost theoretical.
Other CAs are defaulting to certificates valid for only 30 days, which Fastly’s CA “Certainly” has experimented with. At 30-day validity, even a certificate compromise discovered weeks later has limited real-world impact — the certificate is either expired or close to it.
The trajectory is clear: the industry is moving toward shorter and shorter certificate lifetimes. The only way that becomes operationally viable for the average company is full automation. You cannot ask a human to manually renew certificates every 30 days across hundreds of services.
Part 8: Chrome’s Policy Change — What It Actually Requires
One of the major focal points of Chrome Root Program policy Version 1.5 requires that applicants seeking inclusion in the Chrome Root Store must support automated certificate issuance and management.
This is significant because being in Chrome’s Root Store is effectively permission to operate as a trusted CA in the world’s most-used browser. If your CA isn’t in Chrome’s Root Store, your certificates will show scary red errors to Chrome users. This gives Chrome enormous leverage to shape CA behavior.
Some nuances that matter for implementation teams:
These new requirements do not prohibit Chrome Root Store applicants from supporting “non-automated” methods of certificate issuance and renewal, nor require website operators to only rely on automated solutions. The intent is to make automated certificate issuance an option for a CA owner’s customers.
So website operators aren’t being forced to automate — they’re just being ensured the option exists. But given the trajectory of certificate validity lifetimes, automation will become practically mandatory for anyone running more than a few services.
Chrome prefers ACME solutions over those that rely on proprietary protocols, specifically because of ACME’s widespread ecosystem support and adoption. ACME is open and benefits from continued innovation from a robust set of ecosystem participants. There is an extensive set of well-documented ACME client options spanning multiple languages and supported platforms.
Part 9: Practical Implementation — Getting Started with ACME
If you’re a developer or infrastructure engineer who wants to move toward automation today, here’s the realistic path.
For a Single Server (Nginx/Apache)
Certbot is the canonical starting point. On an Ubuntu server running Nginx:
sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d yourdomain.com -d www.yourdomain.com
Certbot installs a systemd timer or cron job that automatically renews certificates before they expire. That’s it. You’ve automated certificate management for this server.
For Kubernetes Environments
cert-manager is the de facto standard for Kubernetes certificate automation. It integrates with Let’s Encrypt (and other ACME CAs) via CRDs:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-team@yourcompany.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
Once a ClusterIssuer is configured, individual services can request certificates with a simple annotation on their Ingress resource. cert-manager handles the ACME challenge, issuance, and renewal automatically.
For Modern Web Servers
Modern web server platform providers like Caddy help website operators configure TLS by default, as do many third-party hosting provider organizations.
Caddy’s entire value proposition is “HTTPS by default.” A minimal Caddyfile:
yourdomain.com {
reverse_proxy localhost:3000
}
That’s it. Caddy automatically obtains and renews a certificate from Let’s Encrypt. You don’t configure TLS — it just happens.
For Multi-Cloud / Enterprise Environments
For organizations with complex, multi-cloud infrastructure, look at tools like:
- Vault by HashiCorp with its PKI secrets engine — useful for internal CAs and short-lived certificates within your own infrastructure.
- AWS Certificate Manager — ACME-adjacent; ACM handles automatic renewal for certs used with AWS services.
- Venafi or DigiCert CertCentral — enterprise-grade platforms with automation workflow integrations.
The key is: whatever tool you use, the goal is zero human involvement in the renewal loop.
Part 10: ACME Renewal Information (ARI) — The Next Leap
ARI deserves its own spotlight. Here’s the core problem it solves: traditional ACME clients renew when the certificate is N days from expiry (commonly 30 days before). This works for normal expirations but breaks down in incident scenarios where a CA needs to revoke millions of certificates immediately.
ARI adds a new endpoint to the ACME protocol that the CA can use to communicate a suggested renewal window to the client. Instead of “renew 30 days before expiry,” the client asks: “When do you want me to renew this specific certificate?” The CA can say “renew between 2:00 AM and 4:00 AM tomorrow” — spreading renewal load across a window, and enabling proactive replacement before an incident affects end users.
For DevSecOps teams: if you’re implementing ACME, make sure your chosen client supports ARI. Certbot, Lego, and cert-manager have been adding ARI support. This is the difference between a CA incident causing hours of scrambling versus it being silently resolved in the background.
Part 11: What This Means for the Future of Web PKI
Promoting broader ubiquity of automated certificate issuance and management will establish an important foundation for the next generation of the Web PKI. Increased use of automation will also unlock future opportunities for more modern and agile infrastructures where strengthened security properties can be realized — for example, where maximum certificate validity can be reduced with minimal downsides.
The roadmap the Chrome Security team is pointing toward:
- Maximum validity continues to shrink. The current 398-day maximum will likely drop to 90 days, then 30 days, then possibly single-digit days for high-security use cases. Each step is only feasible with automation.
- ARI becomes standard. All major ACME CAs will deploy it, enabling near-zero-touch incident response at the CA level.
- Multi-CA failover. There’s further opportunity related to improved fail-over, allowing a graceful transition to a new CA if the preferred provider is unavailable at the time of a request. This would mean your automation could fall back to an alternate CA if your primary one is having issues — a level of resilience that manual management could never achieve.
- ACME for Subdomains (RFC 9444). The Automated Certificate Management Environment for Subdomains standard aims to make it easier for popular server authentication use cases to be supported by ACME.
Closing Thoughts: The Real Cost of Inaction
If you’re still managing TLS certificates manually — downloading them from a CA portal, deploying via SSH, setting calendar reminders for renewals — you are accumulating risk that compounds over time.
The SHA-1 deprecation happened because cryptographic weaknesses eventually become practical attacks. The next deprecation (quantum-resistant algorithms are coming, and post-quantum TLS is already in draft) will happen faster, because the community has learned from past mistakes. Operators who are automated will transition in hours. Manual operators will face months of scrambling, potential site outages, and the reputational damage that comes with them.
The Chrome Security team’s policy change isn’t bureaucratic box-ticking. It’s a forcing function to drag the industry toward a more resilient baseline — one where the infrastructure that protects billions of HTTPS connections doesn’t depend on a human remembering to renew a certificate.
If you depend on a service provider or platform that doesn’t support ACME or another form of certificate automation, that’s a question worth raising with their product team. The trajectory of this industry is clear, and the time to build automation into your infrastructure is before the next incident — not during it.
Further Reading & Resources
- RFC 8555 — ACME Protocol — The authoritative specification.
- Let’s Encrypt: How It Works — Accessible explanation of the ACME flow.
- Certbot Documentation — The EFF’s ACME client, widely used for Apache and Nginx.
- ACME Renewal Information (ARI) Draft — The next evolution in automated incident response.
- Chrome Root Program Policy — The governance document behind Chrome’s trust decisions.
- Certificate Transparency Logs — Public, append-only ledger of all publicly-trusted certificates.
- crt.sh — Tool for exploring Certificate Transparency data.
Comments