The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.
Please note: This schedule is automatically displayed in Eastern Standard Time (UTC–05:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Large-scale language models, such as BERT and GPT-2, have brought exciting leaps in state-of-the-art accuracy for many NLP tasks. BERT requires significant compute during inference, which poses challenges for real-time application performance. KFServing provides a simple model serving interface across common model servers with a standardized REST/gRPC inference protocol to serve single or co-located multiple models on CPU or GPU. KFServing enables hardware acceleration and autoscaling of Bloomberg's own BERT models trained on a corpora of specialized, financial news data. In this talk, we will discuss how we use KFServing in a production application to address scalability, latency, and throughput with Knative’s Autoscaler and Activator. We will also discuss some performance debugging tips and show the GPU benchmark results with TensorFlow/PyTorch BERT models deployed to KFServing.
Dan Sun is a team lead of the Data Science Serverless Runtime team at Bloomberg. Focused on building mission-critical production ML inference managed solutions, he strives to understand and tackle data scientists' complex problems. He also has many years of experience at Bloomberg... Read More →
David Goodwin is a principal software engineer in the Machine Learning group at NVIDIA where he is currently working on tools and usability for deep learning inference. He possesses in-depth knowledge of a wide range of hardware and software components, and software engineering processes... Read More →