Workshop Deatils
Brochure

Hands-On Generative AI: From Foundations to Advanced LLMs and Their Applications

Dec 29^th – 31^th, 2024

Department of Computer Science, College of Science

Sultan Qaboos University

Motive:

The aim of this workshop is to equip participants with advanced skills in using Generative AI and equip them to deal with Multimodal Large Language Models (LLMs). With the increasing need for AI systems that can respond intelligently across text, images, and structured databases, this workshop will provide the tools and knowledge to build such systems using cutting-edge AI models and technologies. Specifically dealing with developing multi-modal question-answering systems. This aligns with the broader institutional goals of advancing AI literacy and fostering innovative research and development in AI-driven applications.

Highlights:

ML Foundations & GenAI Basics: Learn machine learning essentials and foundational concepts of Generative AI, including LLM architecture and applications.
Building Question-Answering Systems: Explore Generative AI applications across diverse data types (text, images, and databases) to create sophisticated Q&A systems.
RAG & Data-Augmented Questioning: Deep dive into Retrieval-Augmented Generation (RAG) to harness LLMs for delivering context-rich, data-driven answers.
Interactive App Development: Develop GenAI-powered applications using tools like Streamlit and Plotly Dash, focusing on usability and interactivity.
Visual Question Answering (VQA): Discover how CNNs enable VQA with practical healthcare use cases involving medical images.
Hands-On Practice: Apply your learning in real-time sessions, gaining practical experience in building cutting-edge Generative AI solutions.

Following fee will be charged for the participants of the workshop:

Registration Type*	Amount
Industry	100
Academia	50
Student	20

The workshop outcomes:

The workshop will provide ML foundations on GenAI concepts. Topics covered include basics of machine learning and LLM architecture useful for GenAI Handson.
The workshop will provide an in-depth exploration of Generative AI and how it can be applied to build sophisticated question-answering systems across multiple data modalities (text, images, and databases).
Participants will gain a strong understanding of Large Language Models (LLMs), including how to integrate APIs to generate questions tailored to different data types.
A deep dive into Retrieval-Augmented Generation (RAG) will demonstrate how LLMs can be harnessed to build Data-Augmented Questioning (DAQ) systems that deliver contextually rich answers by retrieving and generating information from relevant sources.
The workshop will also cover LLM based app development using using Streamlit/plotly dash, enabling participants to build interactive applications based on GenAI.
Visual Question Answering (VQA), powered by Convolutional Neural Networks (CNNs), will be demonstrated, where participants will learn to query images using natural language. Medical images will be used as case studies to showcase the practical applications of VQA in healthcare.
Each session will feature hands-on practice, where participants will directly apply the concepts learned, building solutions in real-time.
By the end of the workshop, participants will have gained hands-on experience in developing Generative AI based models, a full app from a prompt using prompt2model or playlab.

Day-1: Foundational ML for Generative AI
Session	Description	Remarks
1	Fundamentals of Machine Learning for LLM Dr. Hamza Zidoum, Dr. Noushath Shaffi Introduction to Machine Learning Concepts: Supervised and unsupervised learning, data splits (training/testing), and model evaluation (accuracy, precision, recall). Neural Network Basics: Overview of neural networks, including Perceptrons, activation functions, and layers. Hands-On Exercise: Model building for classification/regression	Outcome: Participants will understand the basics of Machine Learning Pre-requisite: Python Programming
2	Fundamentals of Deep Learning for LLM *Dr. Abdelhamid, Dr. Abdur Rahman.* CNNs Seq2Seq Models Hands-On Exercise: Image or Text Classification using Keras or Pytorch.	Outcome: Participants will understand basics of LLM essentials such as tokenizing and training them for representing natural language
3	Looking Inside Large Language Models Dr. Fatma Al Raisi, Dr. Abdur Rahman An overview of Transformer Models Concepts of self-attention and the multi-head attention Brief intro to tools like Hugging Face for accessing and deploying models. Hands-On Exercise: Small exercise generating text with a pre-trained transformer model	Outcome: Participants will understand the concept of Attention and what is fueling the
Day 2: Retrieval Augment Generative based Large Language Models for Document Question and answering
Session	Description	Remarks
1	Understand the basics of LLMs and how they can be used for document-based question answering Key concepts: tokenization, embeddings, attention mechanism. Overview of how LLMs understand and process documents. Introduction to document question answering (QA) using LLMs. Demo: Basic question-answering with a pre-trained LLM. Hands-on Activity Setting up a basic LLM using Hugging Face or OpenAI API to answer simple questions from a document. Explore LLM outputs for different document inputs.	Outcome: Participants will understand the basics of LLM and how it can be used for document question and answering. Pre-requisite Python programming Basics of machine learning Duration: 2 hours
2	Retrieval-Augmented Generation (RAG) for Document QA Why RAG? Bridging retrieval with generation. RAG architecture: How retrieval and LLM generation work together. Chunking documents into retrievable pieces. Storing and retrieving document chunks using vector embeddings. Demo: Setting up a basic RAG pipeline using FAISS or PostgreSQL for embeddings. Hands-on Activity Building a simple RAG system for document QA. Experiment with document chunking and embeddings for better results.	Outcome: Participants will understand the basics of RAG based LLM and how it can be used for document question and answering. How it is different from LLM and what are it advantages over LLM based Q&A. Pre-requisite: Python programming Basics of machine learning Duration: 2 hours
3	Develop a web-based interface for users to upload documents and ask questions. Introduction to Streamlit for rapid prototyping of web apps. Designing an interface for document upload and question input. Integrating the RAG model with Streamlit for real-time QA. Displaying results and real-time feedback as questions are answered. Demo: Uploading documents and querying them via a web interface. Hands-on Activity Build a simple Streamlit application for document QA. Test the app by uploading Word/PDF documents and asking questions.	Outcome: How to build a quick prototype using Streamlit for DAQ. Pre-requisite: Python programming Duration: 2 hours

Day 3: Visual Question Answering (VQA) in the medical domain
Session	Description	Remarks
1	Understand the fundamentals of VQA and how it applies to medical imaging. Overview of VQA: What it is and how it works. Medical imaging modalities (X-rays, MRIs, CT scans) and their importance in healthcare. Use cases of VQA in the medical field: diagnosis assistance, radiology reports, etc. Challenges in medical VQA: complexity, sensitivity of data, and interpretability. Introduction to relevant datasets (e.g., VQA-RAD, SLAKE). Demo: A simple VQA example using a non-medical dataset to familiarize participants with the process. Hands-on Activity: Explore a VQA pipeline using a pre-trained VQA model on general images (using libraries like Hugging Face or PyTorch). Participants experiment with asking basic questions on general images to understand the VQA mechanism.	Outcome: Participants will understand the basics of visual question and answering. How VQA can be used on medical images. Also, we will understand about different datasets that can be used for building medical VQA. Pre-requisite Python programming Basics of image data Duration: 2 hours
2	Dive deeper into techniques specific to medical VQA, including fine-tuning and model integration with medical images. Detailed explanation of how deep learning models (e.g., ResNet, VGG) extract features from medical images. Combining image features with text-based questions using attention mechanisms. Fine-tuning pre-trained models (such as ResNet) for medical imaging datasets. Architectures commonly used in medical VQA: CNNs for image processing, RNNs/Transformers for questions. Demo: Fine-tuning a pre-trained model on a medical dataset like VQA-RAD. Hands-on Activity: Participants will load a medical VQA dataset and preprocess medical images and text-based questions. Fine-tuning a model for medical image question answering. Experiment with asking domain-specific questions on medical images.	Outcome: In this session, participants will understand different convolutional architectures such as ResNet, VGG etc to extract the features from images which will be used for representation. Also, they will understand about how to build VQA Pre-requisite: Python programming Basics of deep learning Duration: 2 hours
3	Developing a Medical VQA System Building an end-to-end VQA system tailored for medical applications. Integration of the VQA model with a front-end system (e.g., using Streamlit) for practical use. Evaluating VQA models in the medical domain: accuracy, precision, recall, and more specific metrics like report generation. Handling domain-specific challenges like imbalanced datasets, uncertainty in medical diagnosis, and ethical considerations. Demo: Building a simple medical VQA prototype where users upload a medical image and ask relevant questions. Hands-on Activity: Participants will implement a complete VQA system that takes medical images and answers questions. Test and evaluate the system using standard VQA evaluation metrics and explore improvements.	Outcome: How to build a quick prototype using Streamlit for VQA. Also, the participants will be knowing, how to handle the challenging issues in developing VQA. Pre-requisite: Python programming Duration: 2 hours