"We're Tier III, so we have 99.98% uptime." We hear this sentence often. Tier ratings speak about component redundancy, not about when an incident occurs. After five years of operation, your actual numbers will tell a completely different story than your PR material.
Five things that influence uptime more than Tier
1. The distance between the diesel generator and the main breaker
On a power outage, the generator kicks in within 8–30 seconds. UPS covers the gap. If there's a long run between UPS and generator with 12 splices, one of the joints can fail under thermal shock during switchover. This doesn't happen in calm operation, but exactly at the moment of switchover — when you need it.
A Tier rating doesn't grade the quality of joints. Commissioning tests do — but only if you actually run them. Many operations skip them because "the components are Tier-certified".
2. Filters in cooling units are replaced by calendar, not by differential pressure
A dusty filter raises energy consumption by 8–15%, reduces effective capacity by 20–30%, and in extreme cases shuts down the cooling unit. Some data centers replace filters quarterly by calendar. Consumption and capacity fluctuate throughout the year depending on when the last change happened.
SOTA practice: every cooling unit has a differential pressure sensor across the filter that sends an alarm when a threshold is exceeded. The filter is changed when it's dusty — not earlier (waste), not later (risk).
3. Cable mapping during the installation phase is never finished
In a large data center, a cable from switch A to rack B is installed along the cable tray indicated in the design. But reality: during installation, four of five cables are laid per design, the fifth gets "routed wherever possible" because the tray was full.
Three years later during reconfiguration, nobody knows where the fifth cable runs. It probably gets cut when somebody drills through a wall. During equipment recovery, this results in 4–8 hours of downtime.
Solution: route labels every 5 m on every cable. Doesn't cost much. Saves years of pain.
4. Firmware updates get postponed until something fails
A switch in the data center has 2019 firmware. It works. Nobody has touched it because "if it works, don't fix it". Four years later a CVE appears with an exploit that attackers are catching. The firmware has to be updated urgently — during business hours, on a live system.
A planned update at night with a rollback plan: 30 minutes downtime, 0 problems. Urgent during business hours under pressure: 4–6 hours downtime, risk of lost data with a bad rollback procedure.
The SLA doesn't state how many firmware-update windows the calendar contains. It should.
5. Personnel, not infrastructure
In a data center at 4 AM on a weekend, a team of 20 engineers doesn't arrive. A dispatcher + one shift technician arrive. Their decisions form the SLA.
The most important thing when choosing a data center: what runbook do the shift technicians have? What escalation channels? How fast does a second technician arrive during a complex incident? Tier rating says nothing about this.
What to actually measure
- **MTTR (Mean Time To Recovery)**: average time to recover after an incident. Better metric than "uptime %", because one 6-hour incident vs. a thousand 5-minute incidents have the same uptime but completely different business impact.
- **Incidents per quarter** for the last 4 quarters. If a data center doesn't have this number, it's not giving you an honest picture.
- **Last unplanned outage > 1 h**: when was the last unplanned outage longer than an hour? A recent date vs. "3 years ago" says a lot.
- **Vendor diversity**: how many different vendors supply UPS, generator, cooling, switches? Single-vendor = stronger integration, weaker resilience against vendor-wide problems.
Conclusion: SLA is a negotiated fiction. Operational quality is reality.
Tier rating + SLA paper + ISO certificate = entry filter. The real quality of operations is verified through quarterly audits, conversations with the night technician, incident reports from the last 24 months.
A client who picks a data center only by Tier rating gets what the paper guarantees. A client who goes deeper gets what they actually need.
---
*These perspectives come from operational experience, from post-mortem analyses after client incidents, and from audits we've done for third parties. If you're choosing a data center or designing your own, we'll walk through the same criteria for your specific use case.*