Workshop Topic

Recent prolific advances in artificial intelligence through the incorporation of domain knowledge have constituted a new paradigm for AI and data mining communities. For example, the human feedback-based language generation in ChatGPT (a large language model (LLM)), the use of Protein Bank in DeepMind's AlphaFold , and the use of 23 rules of safety in DeepMind's Sparrow have demonstrated the success of teaming human knowledge and AI. This collaborative effort is being visualized as the coupling of symbolic computing on knowledge structures (e.g., Knowledge Graphs, Knowledge Bases) and the statistical capabilities of deep neural networks. In addition, the knowledge retrieval-guided language and image generation methods have strengthened the association between knowledge and AI

However, translating research methods and resources into practice presents a new challenge for the machine learning and data/knowledge mining communities. For example, in DARPA's Explainable AI seminar, the need for explainable contextual adaptation is seen as the 3rd phase of AI, facilitating the interplay between data and knowledge for explainability, safety, and, eventually, trust. However, policymakers and practitioners assert serious usability and privacy concerns that constrain adoption, notably in high-consequence domains, such as cybersecurity, healthcare, and other social good domains. In addition, limitations in output quality and measurement and interactive ability, including both the provision of explanations and the acceptance of user guidance, result in adoption rates as low as 33% in such domains.

Themes

This workshop aims to accelerate our pace towards building responsible, intelligent systems by integrating knowledge into contemporary AI and data science methods in two prominent applications: Computer Vision (CV; learning to see) and Natural Language Processing (NLP; learning to read). Along with these two applications, the workshop will lay focus on four areas:



NeuroSymbolic AI

NeuroSymbolic AI techniques offer a promising approach for the development of trustworthy, explainable, and scalable AI systems. By leveraging both statistical and symbolic AI, these techniques aim to address the limitations of each approach while combining their strengths. The result is a system that can integrate expert-created knowledge with learned components, providing a more comprehensive and effective solution. Additionally, the use of symbolic components expressed in user-specific vocabulary opens us many new avenues of explanation generation, unavailable to purely Blackbox AI methods.

Deeper Forms of Knowledge Infusion in AI

The use of multiple forms of knowledge is beneficial in introducing human-like learning and reasoning capabilities in Blackbox AI models. There are four significant forms of knowledge: general purpose (e.g., Wikipedia, Wikidata), commonsense (e.g., ConceptNet), linguistic (e.g., WordNet), and domain specific (e.g., Cyber Threat Intelligence, Unified Medical Language System). These different forms of knowledge can manifest themselves in either improving the representations of BlackBox AI models or serving as constraints to steer the outcome to desired behavior. In addition, deep knowledge infusion can benefit approaches like Reinforcement Learning, Active Learning, and others. In this theme, the workshop looks at research incorporating multiple forms of AI knowledge.

Benchmarking Datasets and Resources

CV datasets like CLEVER and CLEVERER and NLP datasets like CAMS, PRIMATE, and DiaSafety, have motivated the need for AI models to either be explainable or safe. For various research problems, specialized datasets are required to ensure the trustworthiness of AI behaviors. Unfortunately, there is a lack of benchmarking datasets and machine-understandable resources (e.g., lexicons, ontologies) that assess the Neurosymbolic behavior of models with knowledge infusion. This would also enable us to identify the areas where we desperately need benchmarking.

Evaluation/Measurement

Current evaluation methods, like Accuracy, ROC, mean-square error, rouge score, and others, are not helpful metrics to assess Neuro Symbolic methods and systems. Thus, we need understandable and applicable metrics for the evaluations of Neuro Symbolic methods and techniques, such as average square rank error, perceived risk measure, NUBIA, and others. Benchmarking Datasets and Resources: CV datasets like CLEVER and CLEVERER and NLP datasets like CAMS, PRIMATE, and DiaSafety, have motivated the need for AI models to either be explainable or safe. For various research problems, specialized datasets are required to ensure the trustworthiness of AI behaviors. Unfortunately, there is a lack of benchmarking datasets and machine-understandable resources (e.g., lexicons, ontologies) that assess the Neurosymbolic behavior of models with knowledge infusion. This would also enable us to identify the areas where we desperately need benchmarking.