Jeff Brandt, Editor of PinHawk's Law Technology Digest Newsletter, discusses on Thomas Reuters refutal on Stanford University's Human-Centered Artificial Intelligence's Blog: AI on Trial: Legal Models Hallucinate in 1 out of 6 Queries.
This is an interesting post, made even more interesting by the fact that on Thursday of last week (the same day this Stanford post was published) I got an email from Thomson Reuters refuting it. Now in a full disclosure, I have not had a chance to even skim the 30 page report this morning. Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher Manning, and Daniel Ho went to put the AI claims of LexisNexis and Thomson Reuter to the test against more generic tools like GPT-4. They write, "We show that their tools do reduce errors compared to general-purpose AI models like GPT-4." "But even these bespoke legal AI tools still hallucinate an alarming amount of the time: these systems produced incorrect information more than 17% of the time-one in every six queries." In the email from TR, they said, "Thomson Reuters believes that any study which includes its solutions should be completed using the product that is designed for the intended purpose of the study." Ok, that sounds fair. Evaluating a screwdriver's ability to hammer nails would indeed produce substandard response. They went on to write, "We also believe that any benchmarks and definitions that are established should be done in partnership with those working in the industry. We are committed to research and fostering relationships with industry partners that furthers the development of safe and trusted AI." Now no one from LexisNexis reached out to me, but I imagine their thoughts are similar to TR's. I think I'm still in favor of the latter part of the post's subtitle, "the need for benchmarking and public evaluations of AI tools in law." With a hat tip to Stephen Abram, I present Stanford University Human-Centered Artificual Intelligence Blog: AI on Trial: Legal Models Hallucinate in 1 out of 6 Queries
Published May 31, 2024.