A use case for assembly transcripts

To judge the MISeD knowledge, we evaluate with a dataset collected utilizing the standard WOZ method. A “person” annotator was given the overall context for a gathering and requested questions on it, whereas an ”agent” annotator used the complete transcripts to supply solutions and supporting attribution. This WOZ take a look at set comprises 70 dialogs (700 query-response pairs). It serves as an unbiased take a look at set, revealing mannequin efficiency on totally human-generated knowledge. We discovered that the WOZ annotation time was 1.5 instances slower than the MISeD annotation time.

We in contrast the efficiency for the next three mannequin varieties: an encoder-decoder (LongT5 XL) fine-tuned on MISeD for lengthy contexts (16k tokens); LLMs (Gemini Professional/Extremely) utilizing prompts with transcripts and queries (28k tokens); and an LLM (Gemini Professional) fine-tuned on MISeD, utilizing the identical immediate and context size as above.

We educated the fine-tuned agent fashions utilizing the MISeD coaching set (2922 coaching examples). Computerized analysis was computed on the complete take a look at set (628 MISeD queries, 700 WOZ queries), whereas guide analysis was run on a random subset of 100 queries of every take a look at set.

We consider the agent fashions alongside two dimensions: the standard of the generated responses and the accuracy of the supplied attributions, by each computerized and human evaluations. Our analysis methodologies are described in our paper and outcomes are offered beneath: