Loading…
Virtual Event
November 17–November 20, 2020
Learn More and Register to Attend This Event

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Eastern Standard Time (UTC–05:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.
Back To Schedule
Thursday, November 19 • 3:45pm - 4:20pm
Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing - Dan Sun, Bloomberg & David Goodwin, NVIDIA

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.


Large-scale language models, such as BERT and GPT-2, have brought exciting leaps in state-of-the-art accuracy for many NLP tasks. BERT requires significant compute during inference, which poses challenges for real-time application performance. KFServing provides a simple model serving interface across common model servers with a standardized REST/gRPC inference protocol to serve single or co-located multiple models on CPU or GPU. KFServing enables hardware acceleration and autoscaling of Bloomberg's own BERT models trained on a corpora of specialized, financial news data. In this talk, we will discuss how we use KFServing in a production application to address scalability, latency, and throughput with Knative’s Autoscaler and Activator. We will also discuss some performance debugging tips and show the GPU benchmark results with TensorFlow/PyTorch BERT models deployed to KFServing.

Speakers
DS

Dan Sun

Senior Software Engineer, Bloomberg
Dan Sun is a Senior Software Engineer of the Data Science Infrastructure team at Bloomberg, focusing on designing and building mission critical production ML inference managed solution. He strives to understand and tackle data scientists' complex problems. He also has many years of... Read More →
DG

David Goodwin

Principal Software Engineer, NVIDIA
David Goodwin is a principal software engineer in the Machine Learning group at NVIDIA where he is currently working on tools and usability for deep learning inference. He possesses in-depth knowledge of a wide range of hardware and software components, and software engineering processes... Read More →



Thursday November 19, 2020 3:45pm - 4:20pm EST
Intrado Virtual Event Platform