SMBD Tutorial

Date: Dec 17 2024 | Time: 2PM onwards | Duration: 2 hours

Abstract

This tutorial introduces a neuro-symbolic AI framework to analyze big data from social media platforms. Integrating human-curated knowledge through symbolic AI with the pattern recognition capabilities of neural networks enhances the adaptability and efficiency of traditional neural network approaches. Knowledge-guided zero-shot learning techniques enable swift adaption to new linguistic contexts and emerging events [6]. Participants will explore how to design, develop, and utilize these models in specific domains, such as public health surveillance that require dynamic adaptation to new terminologies. The tutorial aims to equip attendees with practical skills and a deep understanding of how to apply neuro-symbolic AI to manage and analyze large-scale social media datasets effectively.

Goals of the Tutorial

Provide a comprehensive understanding of neuro-symbolic AI, applied to complex social media data.
Offer hands-on experience with real-world data and knowledge integration, using neuro-symbolic techniques.
Demonstrate the practical benefits of integrating a knowledge graph with AI systems, enhancing adaptability and performance in dynamic environments.

Focus of the Tutorial

Foundational Principles:

Overview of Neuro-Symbolic AI: Discussion of the integration of symbolic reasoning with neural learning for analyzing unstructured social media.
Characteristics of Social Media Data: Examination of common issues like noise, variability in data formats, and the use of slang or informal language. Strategies for preprocessing and normalizing data for analysis.

Lab Exercise Using Real-World Data and Knowledge:

Data Sources and Knowledge Utilization: Participants will use APIs to fetch social media data and integrate static knowledge from existing knowledge graphs. Examples leverage structures like DBpedia and domain-specific lexicons for health or consumer sentiment analysis.
Continued Access to Tutorial Resources: Most data sources and the knowledge graph will remain accessible to participants, facilitating continued learning.

Knowledge Graph Integration:

Types of Knowledge Graphs: Exploration of general-purpose and domain-specific knowledge graphs, their structures, and their roles in enhancing AI models.
Incorporation Techniques: How to embed knowledge from graphs into neural networks, including entity resolution, relation extraction, and semantic reasoning.

Relevance to Attendees

This tutorial addresses the pressing challenges associated with vast, dynamic datasets, such as those generated by social media platforms. The tutorial highlights how neuro-symbolic AI technology facilitates adaptation and computational demand, crucial for contemporary applications.

Targeted Audience

Data Scientists and AI Researchers: Those exploring advanced data analysis techniques within AI.

Public Health Officials and Social Media Analysts: Professionals seeking to enhance real-time monitoring and analysis.

Advanced Students and Educators: Individuals seeking knowledge in cutting-edge AI technologies.

Content Level

Beginner (30%): Introduction to neuro-symbolic AI's basic principles and theories.
Intermediate (50%): Hands-on coding demonstrations and problem-solving sessions with real-world data.
Advanced (20%): In-depth exploration of advanced applications and optimization techniques.

Length of the Tutorial

We plan a 2-hour tutorial organized in four parts. Each part builds on the previous one, combining theoretical concepts with practical applications of neuro-symbolic AI. The tutorial includes the following segments:

Session 1: Theoretical Foundations (30 minutes) – An overview of neuro-symbolic AI, focusing on its application for analyzing dynamic social media data.
Session 2: Coding Demonstrations (40 minutes) – Hands-on coding session to build neuro-symbolic AI models with large language models and dynamic knowledge graphs.
Session 3: Knowledge Graphs and LLMs in Health Applications (30 minutes) – Exploring neuro-symbolic AI for health-related social media data analysis.
Session 4: Interactive Case Studies and Q&A (20 minutes) – Real-world applications of neuro-symbolic AI, followed by a Q&A session.

(Total Duration: 2 hours, including interactive segments and breaks.)

General Description of Tutorial Content

Session 1: Theoretical Foundations (30 minutes)

Overview: The evolution of neuro-symbolic AI and its critical role in analyzing dynamic social media data.

Content:

Evolution of AI Techniques: From foundational symbolic AI to advanced neural networks, culminating in their integration into neuro-symbolic systems.
Advanced Techniques: Zero-Shot SEDO for dynamic context adaptation and alternatives such as Graph Neural Networks (GNNs) for incorporating structured knowledge directly into learning processes.
Contrasting Traditional and Knowledge-Enhanced Approaches: How the integration of dynamic knowledge bases improves adaptability and accuracy in social media, especially for mental health signals during the COVID-19 pandemic.

Learning Outcomes:

Understanding the superiority of neuro-symbolic systems in handling real-time, dynamic data scenarios.
Recognizing specific improvements in accuracy and responsiveness through integrating structured and unstructured knowledge.

Session 2: Coding Demonstrations (40 minutes)

Overview: Live coding demonstrations to build neuro-symbolic models that integrate large language models and dynamic knowledge graphs for analysis of social media data during health-related events such as COVID-19.

Content: Participants will implement a neuro-symbolic AI model using Python:

Utilizing the Zero-Shot SEDO (Semantic Encoding and Decoding Optimization) framework to dynamically adapt to new terminologies in social media posts without extensive retraining.
Integrating enriched lexicons that include domain-specific language and slang.
Employing semantic similarity techniques to link social media text to relevant concepts in a knowledge graph.
Applying metadata extraction techniques to extract relevant key phrases, hashtags, and geolocation data.

Learning Outcomes:

Integrating and applying advanced neuro-symbolic techniques for real-time data analysis, focusing on adaptability and efficiency.
Handling noisy social media data, such as slang and inconsistent formatting, using domain-specific knowledge graphs.
Enhancing AI model accuracy and performance by leveraging metadata and incorporating new terminologies as they evolve in social media discourse.

Session 3: Knowledge Graphs and LLMs in Health Applications (30 minutes)

Overview: This session will explore the application of knowledge graphs and large language models in health contexts, focusing on the unique challenges of social media data.

Content: How neuro-symbolic AI models, enhanced with a knowledge graph, can be tailored to interpret health-related data from unstructured social media data. We address sentiment analysis, trend detection, and the identification of health misinformation.

Learning Outcomes: Participants will understand the critical role of neuro-symbolic AI in enhancing the reliability of health data analysis on social media platforms.

Session 4: Interactive Case Studies and Q&A (20 minutes)

Overview: An interactive wrap-up session allowing participants to engage with real-world applications of neuro-symbolic AI across various data scenarios.

Content: Presentation of diverse case studies where neuro-symbolic AI has been applied, ranging from public health surveillance to analyzing shifts in public sentiment on social media. A live Q&A will address practical challenges, gather feedback, and discuss potential future developments in the field.

Learning Outcomes: Participants will synthesize knowledge from the tutorial, enhancing their ability to apply neuro-symbolic techniques in their work, and leave with a comprehensive understanding of these approaches' practical challenges and advantages in real-time data analysis.

Expected Background and Prerequisites of Audience

This tutorial will combine lecture-style content with hands-on practice in Python. Participants are expected to have a foundational understanding of artificial intelligence and machine learning principles, along with some proficiency in Python programming. Basic knowledge of neuro-symbolic AI concepts and experience with social media data (e.g., from platforms like Twitter or Reddit) will be helpful but is not required.

The tutorial will guide attendees through practical exercises, including using APIs to fetch and analyze real-world data, integrating a knowledge graph, and building neuro-symbolic AI models. Attendees should bring their laptops; details on required tools and background materials will be provided prior to the tutorial. By the end of the session, participants will understand how to leverage neuro-symbolic AI for real-time data analysis, enhancing adaptability, utility, interpretability, and performance in social media applications.

Tutorial Presentation

Note: The websites and slides preview shown are for illustrative purposes only.

Presenters' Biographies

Vedant Khandelwal

Contact| @khvedant

Artificial Intelligence Institute, University of South Carolina

He is a Doctoral Student at the University of South Carolina, specializing in generative AI, knowledge graphs, and reinforcement learning. His research is focused on developing foundational models for pathfinding problems and safe conversational agents for mental health applications. He has worked extensively on integrating Large Language Models and Knowledge Graphs to build robust AI systems. Vedant's notable projects include analyzing the psychological impacts of pandemics using social media data and developing AI conversational agents to support mental health. He has published his research in prominent journals and conferences such as the AAAI Conference on Artificial Intelligence and IEEE Internet Computing.

Manas Gaur

Contact| @manasgaur

Department of Computer Science and Engineering, University of Maryland, Baltimore County, Maryland

He is an Assistant Professor at the University of Maryland Baltimore County, Department of Computer Science and Electrical Engineering. His research primarily focuses on the intersection of Neurosymbolic AI, knowledge graphs, and machine learning, strongly emphasizing health informatics and cybersecurity. He has contributed to the field through various roles, including Senior Research Scientist at Samsung Research America and Visiting Researcher at the Alan Turing Institute in London. Notable publications include works on trustworthy NeuroSymbolic AI systems and the application of AI in cybersecurity and privacy. He has co-organized high-profile tutorials and workshops on Neurosymbolic AI at major conferences.

Ugur Kursuncu

Contact| @ugurkursuncu

Institute for Insight, Georgia State University, Atlanta, Georgia

He is an Assistant Professor at the Robinson College of Business, Georgia State University, and a former Postdoctoral Fellow at the AI Institute, University of South Carolina. His research spans human-centered social computing, AI for social good, and knowledge-infused learning, specifically focusing on natural language processing and machine/deep learning. He has a robust teaching portfolio, including graduate courses on data programming and research methods for analytics. Dr. Kursuncu has significantly contributed to his field through publications in high-impact journals and conferences and has organized several workshops and tutorials related to cybersecurity and AI.

Valerie L. Shalin

Contact| @valerie-shalin-a4619b71

Department of Psychology, Wright State University, Dayton, Ohioa

She is a Full Professor in the Department of Psychology at Wright State University and an Adjunct (Affiliated) Faculty at the Department of Computer Science \& Engineering and AI Institute at the University of South Carolina. Her research spans cognitive psychology, human-centered computing, and knowledge-infused learning, significantly contributing to understanding human-machine interaction. Dr. Shalin's work includes developing tools for NASA's Mars Exploration Rover and collaborating on numerous grants from NSF, NASA, and other agencies. She has co-authored impactful publications in top-tier journals and conferences, addressing complex problems in natural language understanding and the cognitive aspects of AI. Her recent research focuses on integrating deep learning and knowledge graphs for enhanced cognitive analytics in healthcare and social media contexts.

Amit Sheth| @amitsheth

Artificial Intelligence Institute, University of South Carolina

He is the founding director of the AI Institute, University of South Carolina (AIISC). His current core re- search includes knowledge-infused learning and explanation- ability. He is a fellow of IEEE, AAAI, AAAS, and ACM. He has co-organized $>$100 international events and tutorials. He has founded three companies by licensing his university research outcomes, including the first Semantic Web company in 1999 that pioneered technology similar to what is found today in Google Semantic Search and KG.

References

Sheth, A., Roy, K., & Gaur, M. (2023). Neurosymbolic artificial intelligence (why, what, and how). IEEE Intelligent Systems, 38(3), 56-62.
Sheth, A., Gaur, M., Kursuncu, U., & Wickramarachchi, R. (2019). Shades of knowledge-infused learning for enhancing deep learning. IEEE Internet Computing, 23(6), 54-63.
Gaur, M., & Sheth, A. (2024). Building trustworthy NeuroSymbolic AI Systems: Consistency, reliability, explainability, and safety. AI Magazine, 45(1), 139-155.
Sheth, A., Gaur, M., Roy, K., Venkataraman, R., & Khandelwal, V. (2022). Process knowledge-infused ai: Toward user-level explainability, interpretability, and safety. IEEE Internet Computing, 26(5), 76-84.
Roy, K., Chakraborty, M., Zi, Y., Gaur, M., & Sheth, A. (2024). Neurosymbolic Customized and Compact CoPilots.
Kodirov, E., Xiang, T., & Gong, S. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3174-3183).
Gaur, M., Kursuncu, U., Alambo, A., Sheth, A., Daniulaityte, R., Thirunarayan, K., & Pathak, J. (2018, October). "Let me tell you about your mental health!" Contextualized classification of reddit posts to DSM-5 for web-based intervention. In Proceedings of the 27th ACM international conference on information and knowledge management (pp. 753-762).
Lokala, U., Lamy, F., Daniulaityte, R., Gaur, M., Gyrard, A., Thirunarayan, K., ... & Sheth, A. (2022). Drug abuse ontology to harness web-based data for substance use epidemiology research: ontology development study. JMIR public health and surveillance, 8(12), e24938.