Salmon Run: KGC/HCLS 2024 Journey Report

I used to be at KGC (Data Graph Convention) 2024, which is occurring Could 6-10 at Cornell Tech. I used to be presenting (nearly) at their Well being Care and Life Sciences (HCLS) workshop, so my audio system move was solely legitimate for immediately for the HCLS portion of KGC. My journey report covers a couple of talks that I attended right here. Attending nearly was a bit chaotic as classes went over typically, so that you would possibly go away a session to attend one other, solely to seek out that it hadn’t began but. That is exhausting to forsee, we have now confronted this concern ourselves the primary time we moved an inner convention from in-person to hybrid.

KGs in RAG (Tom Smoker, WhatWhyHow.AI)

I’ve been working with Giant Language Fashions (LLMs) and Retrieval Augmented Era (RAG) for nearly a yr now, and I went to this speak hoping for insights on the best way to use graphs as enter to RAG programs. Understandably, the speaker spent a while protecting the fundamentals, which I personally didn’t discover very fruitful. Nonetheless, there have been some nuggets of knowledge I acquired out of the speak. First, the RAG pipelines can decrease the chance of hallucinations by utilizing LLMs for planning and reasoning, however with out delegating to LLMs for factual data. And second, an agent structure can extra effectively use smaller sub-graphs which may typically be generated dynamically in Closed World fashions.

A aspect dialogue on chat additionally yielded a paper reference Getting from Generative AI to Reliable AI: what LLMs might study from Cyc (Lenat and Marcus, 2023). The paper appears actually fascinating on an preliminary skim and I plan to learn in additional element later.

Data Graphs for Precision Oncology (Krishna Bulusu, AstraZeneca)

A pleasant overview of purposes of Data Graph (KG) to Drug Discovery (DD). DD makes an attempt to use KG to unravel 3 principal issues: (1) discover gene inflicting illness (2) match drug with illness and (3) (drug, gene, illness) as a elementary relationship in DD. The speaker identified that the large benefit of KGs is Explainability. He additionally talked about using graph clustering for node stratification.

Combining graph and vector illustration for environment friendly data retrieval (Peio Popov, Ontotext)

This was a presentation from OntoText the place they demonstrated new options constructed into their GraphDB database. This was of curiosity to me personally since our KG can be constructed utilizing GraphDB. Particularly they’ve built-in LLM and vector search assist into their merchandise to allow them to be invoked from a SPARQL question. This provides GraphDB customers the ability to mix these methods in the identical name somewhat than construct multi-stage pipelines.

I additionally discovered the excellence between Semantic, Full textual content and Vector Search as ones primarily based off KG, Lucene (or Lucene-like) indexes and vector search platforms, I’d beforehand conflate the primary and third.

Data Engineering in Scientific Choice Help: When a Graph Representational Mannequin shouldn’t be sufficient (Maulik Kamdar, Optum)

This was a presentation from my ex-colleague Maulik Kamdar. He talks about challenges in Scientific Choice Help (CDS) the place a KG alone is inadequate. Particularly the case he’s contemplating the place a number of third celebration ontologies must be aligned into one KG. On this state of affairs, related ideas are mixed into ValueSets, that are then composed with naked ideas or with one another to kind Scientific Guidelines. Scientific Guidelines are additional mixed to kind Scientific Calculators or Questionnaires, that are then mixed to kind Choice Bushes and Flowcharts, that are then mixed into Scientific Tips. I’m most likely biased given our frequent historical past, however I discovered this speak to be essentially the most instructional for me.

Data Graphs, Theorem Provers and Language Fashions (Vijay Saraswat and Nikolaos Vasiloglou)

The audio system mentioned the function of self-discovery, In-Context Studying (ICL), symbiotic integration of KG with search, and Graph RAG in reasoning engines powered by KG and LLM. They characterize an Agent as an LLM primarily based black field that is supplied with pairs of input-output cases to study some unknown perform (just like ML fashions). They describe ICL as studying by means of few shot and lots of shot examples. In addition they discuss utilizing the output of KG to fact-check / improve LLMs and utilizing LLMs to generate assertions that can be utilized to create a KG. Their demo exhibits how an LLM is ready to study to generate a Datalog like graph question language from textual content prompts utilizing few-shot examples.

The speaker made reference to the next three papers in assist of the methods he was describing, which I’ve duly added to my studying record.

A Scalable and Sturdy Named Entity Recognition and Linking System for a Scientific Healthcare Data Graph (Sujit Pal, Elsevier Well being)

This was my speak. I had initially supposed to attend in individual however it appeared wasteful to fly throughout the nation to ship a 5-minute presentation. It did take a little bit of planning to current remotely however I discovered two helpful life classes.

  1. You’ll be able to generate a presentation video from MS Powerpoint. Merely create your slides and report a slideshow the place you report your self narrating your presentation. As soon as completed, export as an MP4 and add to Youtube or different video service.
  2. You’ll be able to print posters on-line and have them delivered to another person.

Enormous because of my colleague Tom Woodcock who attended in individual, and who was variety sufficient to hold and dangle my poster on the convention for me, and who additionally agreed to current my slideshow for me (though I believe that ultimately he didn’t should). Many thanks additionally to my ex-colleague Helena Deus (a part of the HCLS organizing staff), who helped stroll me by means of to a workable answer and was instrumental in my speak being delivered efficiently. Additionally because of Leah Walton from the HCLS organizing staff, for supporting me in my try and current remotely.

Right here is the Youtube video for my 5-minute presentation in case you have an interest. It’s a bit high-level since I had solely 5 minutes to cowl all the pieces, however there is a bit more data within the poster under.

Graphs for good – Speculation era for Uncommon Illness Therapy (Brian Martin, AbbVie)

This presentation revolves round a graph that connects ailments to medication by way of illness variants, gene, pathway, gene and compound entities. This was used to discover a treatment for a uncommon illness utilizing present drugs. It was later prolonged to seek out candidate cures for a gaggle of 20 most uncared for ailments worldwide. The audio system verified that outcomes for Dengue fever correlates nicely with beforehand recognized data, thus supporting the veracity of the strategy. The paper describing this work is Leveraging a Billion-Edge Data Graph for Drug Re-purposing and Goal Prioritization utilizing Genomically-Knowledgeable Subgraphs (Martin et al, 2022).

Producing and Querying Graphs with LLM (Brian Martin, Subha Madhavan, Berenice Wulbrecht)

Panel dialogue the place varied methods for producing and querying graphs utilizing LLMs have been mentioned. Entertaining (and considerably predictable) comparisons of Property Graphs vs RDF graphs to Ford and Ferrari cars, and the way LLMs remodel them into Teslas (with its self-driving expertise). In addition they discuss extracting assertions from a corpus of paperwork to create a KG custom-made for the corpus, after which utilizing the KG to fact-check the output of the LLM for RAG queries in opposition to that corpus.

General, I believe it was an important convention. Discovered loads, would love to return and current right here sooner or later, hopefully this time in individual.

Leave a Reply