LLM-Integrated Knowledge Graph Generation

Knowledge Graphs (KGs) are crucial for enhancing semantic web technologies, improving information retrieval, and bolstering data-driven AI systems. Despite their importance, constructing KGs from text corpora remains a significant challenge. Traditional methods, which rely on manually crafted rules and machine learning techniques, often struggle with domain-specific texts and cross-domain transferability. Recent advances in generative AI, particularly Large Language Models (LLMs) like GPT-4o, LLAMA-3.2, and the newer Qwen, promise to continue to revolutionize traditional text mining paradigms, including KG construction [1, 2, 3], due to their advanced capabilities in understanding and generating human-like text. Integrating LLMs into the KG construction pipeline can enable richer, more accurate extraction and inference of knowledge from unstructured text sources and provide solutions that are readily transferable across domains. Our proposed workshop, unlike others related, stands out for the breadth and uniqueness of its focus on generating KGs from diverse text domains—including research papers, legal documents, newswires, and social media. It emphasizes discussions on scalable strategies, maintaining consistency, and mitigating errors. Applications range from news analytics to scientific research.

Evolving from its previous three iterations, the TEXT2KG initiative, proposed as LLM-TEXT2KG in its 4th iteration, aims to explore the novel intersection of LLMs and KG generation, focusing on innovative approaches, best practices, and challenges. It will serve as a platform to discuss how LLMs can be utilized for improved knowledge extraction, context-aware entity disambiguation, named entity recognition, relation extraction, knowledge representation and seamless ontology alignment. The workshop solicits a broad range of papers including full research papers, negative results, position papers, and system demos examining the wide range of issues and processes related to knowledge graphs generation from text corpora. Papers on resources (methods, tools, benchmarks, libraries, datasets) are also welcomed.

The proceedings of the previous three editions of the workshops are in:
(2022) https://ceur-ws.org/Vol-3184, (2023) https://ceur-ws.org/Vol-3447, and (2024) https://ceur-ws.org/Vol-3747.

Why attend the LLM-Text2KG Workshop?

This workshop aims to bring together researchers from multiple areas such as Natural Language Processing (NLP), Entity Linking (EL), Relation Extraction (RE), Knowledge Representation & Reasoning (KRR), Deep Learning (DL), Knowledge Base Construction (KBC), Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Knowledge Sharing between Agent, Semantic Web, Linked Data, & other related fields to foster a discussion and enhance the state-of-the-art in knowledge graph generation from text.
The participants will find opportunities to present and hear about other emerging research and applications, to exchange ideas and experiences, and to identify new opportunities for collaborations across disciplines. We plan to involve the many prominent research groups in the Semantic Web community which in the last years focused on the generation of knowledge graphs from textual sources in different fields, such as research data (ORKG, AI-KG, Nanopublications), question answering (ParaQA, NSQA), common sense (CSKG), automotive (CoSI, ASKG), biomedical (Hetionet), and many others.


Themes & Topics

We are interested in (including but not limited to) the following themes and topics that study the generation of Knowledge Graphs from text with LLMs,
based on quantitative, qualitative, and mixed research methods.

Themes

  • LLM-based Entity Recognition and Relation Extraction from Complex, Unstructured Text
  • LLM-driven Inference of Implicit Relationships and Knowledge Discovery
  • Addressing and Mitigating Hallucinations and Biases in LLM outputs
  • Advances in Fine-tuning and Customizing LLMs for KG Generation Tasks
  • Industrial Applications Involving KGs Generation from Text

Topics

  • Open Information Extraction
  • Deep Learning and Generative approaches
  • Human-in-the-loop methods
  • Large Language Models and Knowledge Graphs
  • RAG-Driven Knowledge Extraction
  • Intelligent Agents for Text to Knowledge Graphs
  • LLM-KG Integration
  • Benchmarks for KG Generation from Text
  • Evaluation Methods for KGs Generated from Text

Important Dates


Paper submissions due: March 7th, 2025
Final decision notification: April 4th, 2025
Camera-ready submissions due: April 18th, 2025
Workshop: June 1 - June 5, 2025

Submission Instructions


We invite full research papers, negative results, position papers, dataset and system demo papers.
The page limit for the full research papers, negative results and dataset papers is 16 pages excluding references and for the short papers and demos it is 7 pages excluding references.
Submissions must be original and should not have been published previously or be under consideration for publication while being evaluated for this workshop. Submissions will be evaluated by the program committee based on the quality of the work and its fit to the workshop themes. All submissions are double-blind and a high-resolution PDF of the paper should be uploaded to the EasyChair submission site before the paper submission deadline.
The accepted papers will be presented at the Text2KG workshop integrated with the conference, and they will be published as CEUR proceedings.
All must be submitted and formatted in the style of the CEUR proceedings format.
For details on CEUR style, see CEUR's Author Instruction.
Also see Overleaf Template.

Workshop Schedule

TBD

Co-located Event: Second International Biochemical Knowledge Extraction Challenge (BiKE)

Most of the structured biochemical information available on the Web today is manually curated, and it is practically impossible to keep pace with the research being constantly published in scientific articles. Within this challenge, we want to speed up and promote research on automatic biochemical knowledge extraction mechanisms with the aim of increasing the information available on natural products to promote the development of environmentally friendly products while increasing awareness of the biodiversity value.

Challenge Link: https://aksw.github.io/bike/

Organizer: Edgard Marx

Co-located Event: First International TEXT2SPARQL Challenge

The Text2SPARQL Challenge is a competitive and collaborative benchmark designed to push the boundaries of natural language processing (NLP) and Semantic Web technologies, with a particular emphasis on Neurosymbolic AI approaches. This challenge focuses on translating natural language questions into SPARQL queries, a structured query language for interacting with data in Resource Description Framework (RDF) formats, which are often used in knowledge graphs. Neurosymbolic AI—a hybrid approach combining neural network-based models with symbolic reasoning techniques—plays a crucial role in this challenge by addressing both the interpretative power of neural networks and the logical precision of symbolic methods. By incorporating Neurosymbolic AI, participants can develop models that better capture complex linguistic structures and the logical syntax required for accurate SPARQL query generation. This integration not only enhances model performance but also opens up new possibilities for human-computer interaction, as it enables more intuitive and accessible interfaces for querying large-scale knowledge graphs. Success in the Text2SPARQL Challenge has significant implications, from improving knowledge graph accessibility to advancing the development of intelligent systems capable of nuanced reasoning, thus marking an important step forward in AI-driven data interaction.

Challenge Link: https://text2sparql.aksw.org/

Organizer: Edgard Marx

Organizing Committee

Sanju
Tiwari

Sharda University, Delhi-NCR, India &
TIB Hannover Germany

tiwarisanju18@ieee.org

Nandana Mihindukulasooriya

IBM Research, Dublin, Ireland

nandana.m@ibm.com

Jennifer
D’Souza

TIB, Germany

jennifer.dsouza@tib.eu

Francesco
Osborne

KMi, The Open University

francesco.osborne@open.ac.uk



Steering Committee
& Publicity Chair

Amit
Sheth

AIISC, University of South Carolina

amit@sc.edu

Joey
Yip

AIISC, University of South Carolina

hyip@email.sc.edu



Advisory Committee

Program Committee

  • Angelo Salatino, Birmingham City University, UK
  • Amna Dirdi, Birmingham City University, UK
  • Davide Buscaldi, Université Paris 13, France
  • Hamed Babaei Giglou, TIB Hanover, Germany
  • Hossein Ghomeshi, Birmingham City University, UK
  • Maosheng Guo, Diffbot, USA
  • Serge Sonfack Sounchio, Ecole Nationale d’Ing ́enieurs de Tarbes, France
  • Tek Raj Chhetri, University of Innsbruck, Austria
  • Tomasso Soru, Serendipity AI, UK
  • Patience Usoro, University of Uyo, Nigeria
  • Disha Purohit, TIB, Germany
  • Ogerta Elezaj, Birmingham City University, UK
  • Marlene Goncalves, Universidad Simon Bolivar, Venezuela

Previous Workshops

References

  • [1] Khorashadizadeh, H., Amara, F.Z., Ezzabady, M., Ieng, F., Tiwari, S., Mihindukulasooriya, N., Groppe, J., Sahri, S., Benamara, F., Groppe, S.: Research trends for the interplay between large language models and knowledge graphs. arXiv preprint arXiv:2406.08223 (2024)
  • [2] Liang, X., Wang, Z., Li, M., Yan, Z.: A survey of llm-augmented knowledge graph construction and application in complex product design. Procedia CIRP 128, 870–875 (2024)
  • [3] Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., Zhang, N.: Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities. World Wide Web 27(5), 58 (2024)