Lab Notes

Busy week around here, but I’ve got some things in the mix as you can see here that I’ll be wrapping up after the holiday. One of the few papers out there discussing technical details of a legal LLM is already showing its age as additional prompting strategies emerge. I’ve done my best to recreate what was described in the study in terms of this promo, but will be extending this to some newer and more experimental prompting strategies that are showing real promise.

For now though, long weekend in the country with no keyboard.

-jb


Document Review Benchmark Rankings for May 20, 2023

GPT-4o continues to impress in our latest round of benchmarking tests for document review. The model consistently delivers comprehensive and nuanced analyses, demonstrating a deep understanding of the legal issues at hand and providing practical insights for building a strong case strategy.

In this iteration, we focused on a more targeted evaluation of the top-performing models from previous tests, aiming to identify the subtle differences that set them apart. By refining our testing methodology and placing a greater emphasis on the depth, clarity, and strategic value of the generated reviews, we were able to better differentiate between the S-tier and A-tier models.

It's important to note that these rankings are specific to the particular document set and review protocol used in this benchmark. As we move forward, we plan to test additional review strategies and conduct a deeper analysis of a larger email dataset to ensure consistency in results and provide a more comprehensive assessment of each model's capabilities.

While some models, such as Gemini 1.5 Flash, may have fallen short in this particular use case, they could still prove valuable in other legal applications or with further fine-tuning. As always, our rankings are based on a specific set of criteria and should be considered in the context of each law firm's unique needs and priorities.

Document Review Rankings for May 20, 2023


Ranking:

  1. GPT-4o
  2. Claude 3 Opus
  3. GPT-4 Turbo
  4. Claude 3 Sonnet
  5. GPT-4
  6. Gemini 1.5 Pro