Assessing ASR efficiency with that means preservation

Which means preservation as a substitute metric

Our analysis leveraged the Mission Euphonia corpus, a repository of disordered speech encompassing over 1.2 million utterances from roughly 2,000 people with numerous speech impairments. To increase knowledge assortment to Spanish audio system, Mission Euphonia partnered with the Worldwide Alliance of ALS/MND Associations, which facilitated the contribution of speech samples from people residing with ALS in Mexico, Colombia, and Peru. Equally, Mission Euphonia expanded to French audio system by a partnership with Romain Gombert from the Paris Mind Institute to gather knowledge from folks with atypical speech in France.

For our experiments, we generated a dataset of 4,731 examples consisting of floor reality and transcription error pairs together with a human label figuring out whether or not these pairs can be that means preserving or not (see particulars in our paper). We break up the dataset into coaching, take a look at, and validation units (80% / 10% / 10%, respectively) making certain the three units wouldn’t overlap on the bottom reality phrase degree.

With this knowledge, we educated a classifier for that means preservation on high of a base LLM. Utilizing prompt-tuning — a parameter-efficient technique to adapt LLMs — we conditioned our base LLM on our coaching set to foretell the labels “sure” or “no” to point whether or not the that means has been preserved or not.

We use the next format to characterize the information to the LLM: