A medical chatbot trained on Huberman Lab podcast transcripts — comparing a naïve DeepSeek-7B, an attention LSTM, and a fine-tuned DeepSeek-7B.
MediSeek answers health and wellness questions grounded in Huberman Lab episodes. It provides a choice of three model pathways: a naïve DeepSeek-7B (no tuning), an attention-based LSTM trained on podcast-derived QA pairs, and a fine-tuned DeepSeek-7B optimized via PEFT/LoRA. The goal is to evaluate quality/cost trade-offs and let users pick the best fit.
Episodes scraped; transcripts pulled via Selenium/Podscribe API; raw text stored under data/raw_transcripts
.
Removed metadata, timestamps, ads; normalized text with clean_podcasts.py
into data/clean_transcripts
.
GPT-4o generated ~30 QA pairs per episode across General, Specific, and Technical levels; stored in data/qa_pairs
.
Naïve DeepSeek-7B; attention LSTM encoder-decoder; fine-tuned DeepSeek-7B via PEFT/LoRA with instruction-tuned prompts.
Held-out test set (latest 5 episodes). Metrics: ROUGE-1/2/L, BLEU, METEOR, BERTScore; per-topic visualizations and model comparisons.
We compare the naïve DeepSeek-7B, attention LSTM, and fine-tuned DeepSeek-7B on the latest five episodes not seen during training. Metrics include ROUGE-1/2/L, BLEU, METEOR, and BERTScore, with per-topic breakdowns and model-to-model comparisons. View the full interactive summary here: Evaluation Report (HTML).