From First Principles to Real-World Applications: Advancing LLM Reasoning from Logic to Law

Date of Award

Fall 1-1-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Radev, Dragomir

Abstract

Large language models (LLMs) have demonstrated remarkable progress across natural language understanding and generation, with especially visible gains in areas such as mathematical and commonsense reasoning. Yet, the reliability of their reasoning processes remain open questions in logical and legal reasoning. Current evaluations often emphasize final answers, but these can obscure whether models have engaged in rigorous inference, or relied on fragile shortcuts. As a result, measuring not only what answer is produced but also how it is derived is crucial.This dissertation advances the study of reasoning in LLMs through a progression from first principles to real-world applications. We begin by introducing a novel dataset, FOLIO, which captures natural language problems annotated with first-order logic, enabling precise evaluation of deductive validity with both natural language and symbolic reasoning. Having established FOLIO, we then focus on step-by-step reasoning evaluation, where the models are assessed in how they reach an answer, rather than solely evaluating the final answer. To do so, we introduce P-FOLIO, a process-based logical reasoning benchmark involving human written proofs. P-FOLIO supports fine-grained assessment of inference steps and provides structured supervision to train models toward more rigorous reasoning. Having examined the strengths and weaknesses of both natural language and symbolic reasoning, we identify adaptability as the next critical challenge: enabling models to integrate and transition between these modes as the problem demands.HYBRIDMIND offers an adaptive framework that chooses dynamically between natural language and symbolic approaches such as code or formal logic, improving performance on both logical and mathematical problems. At the real-world frontier, LLMs are increasingly used in the legal domain for tasks like summarizing case law or offering basic legal guidance. However, their ability to generate full judicial analyses, such as complete reasoning sections in U.S. court opinions, remains largely unexplored.CourtReasoner introduces the first benchmark for full-length judicial-style reasoning in U.S. court opinions, evaluating whether models can construct coherent, precedent-grounded arguments under adversarial conditions. Together, this dissertation makes significant contributions toward advancing LLMs in logical and legal reasoning. By combining benchmarks that span formal logic and legal argumentation with methods that integrate symbolic and natural language reasoning, this dissertation provides both diagnostic tools and algorithmic strategies, moving from first principles to high-stakes, real-world applications.

This document is currently not available here.

Share

COinS