Is your system healthy? Are SLOs being met? What are the top performance constraints? What are the high-priority implementation concerns? Is the architecture a right fit? Are the teams leveraging the capabilities of the platform? What are the pain points with platform services? It can be challenging to find root cause among problem symptoms in distributed systems. Just as in real life, it's important for microservices to undergo regular health checks.
In this talk, we'll provide a systems-based approach to execute an app health check along 10 different dimensions: monitoring and metrics, failure mode analysis, technical debt, emergency response, performance optimization, change management, microservices rationalization, platform as a product, balanced team, and path to production. We'll explain how to address issues uncovered during a health check and provide recommendations on how to build a sustainable Day 2 app-ops reliability engineering practice.