Abstract
The unparalleled volume of data generated has heightened the need for approaches that can manage and translate them into actionable insights. While the contemporary data-driven and generative systems are popular for handling large volume of changing and diverse data, they are not silver bullets due to the inherent lack of knowledge grounding. The emerging use of knowledge-driven processes have surfaced as compelling approaches for leveraging external knowledge and structured representation to complement the shortcomings within data-driven systems. Such processes which while exploiting data, also use extensive knowledge in the form of Knowledge Graphs (KGs). In this tutorial, we will introduce and provide interactive hands-on and lab-oriented sessions on the knowledge-driven processes for big data management and applications using realworld datasets ranging from structured, semi-structured, and unstructured formats. Specifically we will use the EMPWR [1] platform for creating and maintaining large KGs and demonstrate recent innovations in three concrete real world use-cases: (i) development of a pharmaceutical KG with over 6M triples, 1.5M nodes, and 3000 relation types; (ii) development of a suite of large scale KGs with 10M+ triples, 2M+ entities, and 19 relations from real-world driving scenes and their use in machine perception tasks; and (iii) AI pipelines recommender system with KG consisting of 78M triples, 8M nodes, and 25M relations.
Why Attend?
The sheer volume, variety, and velocity of data generated in
the current digital era have heightened the need for approaches
that can effectively manage and make sense of these big
data. While modern data-driven systems, including Large
Language Models (LLMs), can learn complex patterns and
relationships, their internal understanding of the world (i.e.,
the world model) can be brittle and is influenced by the
data that they are trained on. As we enter what DARPA
describes as the third phase of AI (i.e., Neurosymbolic AI
[8]), knowledge-driven systems such as Knowledge Graphs
(KGs) are increasingly used to provide notational efficacy and
declarative capabilities to make implicit data explicit, enabling
high-quality linguistic and situational knowledge for more
explainable output. Recognizing the merits of both data and
knowledge-driven processes, we advocate an approach that
hybridizes the wide variety of techniques from both spectra for
managing big data with KGs as the infrastructure. Considering
three application domains, we will guide the audience with
demonstrations and hands-on sessions on the infrastructure
with real-world tasks.