The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in Eastern Standard Time (UTC–05:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
How to define reliability of a Kubernetes cluster? What are the SLOs? How many 9s is enough to ensure end-users are happy for a Kubernetes cluster with thousands of nodes? Service-level-objective (SLO) is the key to run large-scale production cluster reliably. Defining SLOs for classic web services is simple, since web requests are served synchronously with distinct status code. On the contrast, defining SLOs for Kubernetes services is obscured due to its intent-oriented design and declarative APIs. This talk first briefs the philosophy behind the SLO-driven approach for reliability engineering, followed by a deep dive of how SREs define SLOs for one of the world largest Kubernetes cluster in Ant Financial. Finally this talk shares concrete cases and lessons learned of building SLOs framework from several perspectives, including monitoring, alerting and tracing.
Qian works at Ant Group as a staff engineer focusing on site reliability engineering. He is the SRE tech lead of adopting Kubernetes in Ant Financial's production environment. He is passionate about adopting and promoting SRE's philosophy for managing large-scale production systems... Read More →
Cong Chen is a senior site reliability engineer at Ant Financial. Currently he is in charge of the stability of large-scale Kubernetes clusters of Ant Financial. Previously he worked at DIDI and RedHat as an architect and virtualization engineer. He has been focusing on the topic... Read More →