Name: Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing - Dan Sun, Bloomberg & David Goodwin, NVIDIA
Start: 2020-11-19T15:45:00-0500
End: 2020-11-19T16:20:00-0500

Virtual Event
November 17–November 20, 2020
Learn More and Register to Attend This Event

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2020 - Virtual to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Eastern Standard Time (UTC–05:00). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change.

Back To Schedule

Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing - Dan Sun, Bloomberg & David Goodwin, NVIDIA

Feedback form is now closed.

Large-scale language models, such as BERT and GPT-2, have brought exciting leaps in state-of-the-art accuracy for many NLP tasks. BERT requires significant compute during inference, which poses challenges for real-time application performance. KFServing provides a simple model serving interface across common model servers with a standardized REST/gRPC inference protocol to serve single or co-located multiple models on CPU or GPU. KFServing enables hardware acceleration and autoscaling of Bloomberg's own BERT models trained on a corpora of specialized, financial news data. In this talk, we will discuss how we use KFServing in a production application to address scalability, latency, and throughput with Knative’s Autoscaler and Activator. We will also discuss some performance debugging tips and show the GPU benchmark results with TensorFlow/PyTorch BERT models deployed to KFServing.

Speakers

Dan Sun

Software Engineer Team Lead, Bloomberg

Dan Sun is a team lead of the Data Science Serverless Runtime team at Bloomberg. Focused on building mission-critical production ML inference managed solutions, he strives to understand and tackle data scientists' complex problems. He also has many years of experience at Bloomberg... Read More →

David Goodwin

Principal Software Engineer, NVIDIA

David Goodwin is a principal software engineer in the Machine Learning group at NVIDIA where he is currently working on tools and usability for deep learning inference. He possesses in-depth knowledge of a wide range of hardware and software components, and software engineering processes... Read More →

Accelerate and Standardize Deep Learning Inference with KFServing pdf

Text Transcript ANHA 6784 txt

Thursday November 19, 2020 3:45pm - 4:20pm EST
Intrado Virtual Event Platform

Machine Learning + Data

Content Experience Level Intermediate (Mid-level experience)

KubeCon + CloudNativeCon North America 2020 Virtual

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Dan Sun

David Goodwin