11-04, 10:45–11:15 (Asia/Jerusalem), Red Track
This talk navigates from Recurrent Neural Networks to Generative Pretrained Transformers (GPTs), with a primary focus on understanding attention mechanisms. We start with the building blocks: The Perceptron and RNN cells, and after identifying the issues that arise with RNNs, we delve into attention mechanisms, examining their role in translation tasks and leading into a detailed dissection of self-attention. The culmination is a comprehensive review of Transformer models and the GPT series, their performances, and their capacities in zero-shot, one-shot, and few-shot learning with prompts.
This presentation begins by laying out the foundational components of sequence modeling - Perceptrons and RNN cells. We discuss inherent issues associated with RNNs, focusing on challenges such as handling long sequences and managing vanishing or exploding gradients.
Attention mechanisms form the core of this lecture. We present a practical case of attention in translation tasks, followed by an in-depth examination of self-attention, a variant independent of external context. We unpack its motivations and explain its implementation process. We proceed with an exploration of the Transformer model and the way it leverages self-attention.
The final section is dedicated to the Generative Pretrained Transformers (GPT) series. We break down the architecture of GPT assessing its distinguishing features. We delve into zero-shot, one-shot, and few-shot learning, discussing how these models interact with prompts with limited training examples.
Overall, this lecture aims to provide a thorough understanding of attention mechanisms and their practical applications in sequence modeling and NLP, equipping attendees to effectively apply these concepts in their work.
Alon Oring is the Head of Research at Dynamic Infrastructure, a predictive maintenance startup focused on using computer vision to identify defects and risks in critical infrastructure before they evolve into large-scale failures. Since joining Dynamic Infrastructure in 2019, Alon has led the development of several core technologies that obtained state-of-the-art performance and are currently serving multiple customers worldwide. Additionally, Alon is an active lecturer on deep learning, machine learning, and data science at Reichman University (IDC Herzliya), international coding boot camps, and an active mentor for up-and-coming data scientists.