Back to Projects

LegalBERT

Enhancing DistilBERT to predict legislative bill subjects from titles and metadata using factual legal knowledge.

Duration: 2 weeks
Team Size: 4

Project Overview

The goal is to classify U.S. legislative bills into subject categories using only metadata—primarily bill titles— by fine-tuning DistilBERT and injecting factual legal knowledge into training. This supports analysts and researchers with fast, consistent tagging across large bill volumes.

Problem

  • Bill titles are short and domain-specific, making subject inference difficult without legal background.
  • Subtle distinctions between policy areas lead to frequent misclassification with naïve text models.
  • Manual labeling doesn’t scale and is error-prone across 90th–119th Congress data.

Architecture & Solution

Data & Labels

Titles and subject categories collected from congress.gov (93rd–118th for train/val; 119th for test). Label mapping managed via label_mapping.json; data cleaning in process_data.py.

Model

Base model distilbert-base-uncased with a custom classification head; end-to-end fine-tuning using cross-entropy.

Knowledge Focus

Focal property: factual legal knowledge. Promotes better separation of near-neighbor subjects and improved generalization.

Evaluation Pipeline

Accuracy, macro Precision/Recall/F1, confusion matrices; auxiliary ROUGE, BLEU, and BERTScore for semantic similarity.

Training & Tracking

Training via Colab (A100) orchestrated in run_legal_bert.ipynb; metrics logged with Weights & Biases.

Technology Stack

Core

  • Python · PyTorch · Hugging Face Transformers

Data & Eval

  • Pandas · scikit-learn · Matplotlib

Ops

  • W&B for experiment tracking · Colab GPU (A100)

Key Features

  • Clean preprocessing and consistent label mapping for robust fine-tuning.
  • Independent test set from the 119th Congress for honest generalization checks.
  • Automated artifacts: plots, confusion matrices, and an HTML evaluation report.

Evaluation

We report Accuracy and macro-averaged Precision/Recall/F1 on validation and an independent 119th-Congress test set. To interpret errors, we include per-class confusion matrices and additional semantics-focused scores (ROUGE, BLEU, BERTScore). Our results show that injecting factual legal knowledge significantly improves classification performance, especially on challenging near-neighbor subjects.