Projects

Multi-modal Scientific Question Answering

COMS W4705: Natural Language Processing

Columbia University
Fall 2025

Designed a multi-modal Scientific Question Answering (SQA) system leveraging the ScienceQA benchmark, integrating image, question text, context paragraphs, and multiple-choice reasoning.

Key Achievements

  • Implemented and evaluated Vision-Language Transformer architectures (BLIP-2, ViLT) with chain-of-thought prompting to enable step-by-step scientific reasoning across visual and textual modalities
  • Proposed a hybrid reasoning pipeline combining frozen vision encoders with instruction-tuned LLM backbones, improving interpretability while reducing training cost
  • Achieved up to +6.3% accuracy improvement over unimodal baselines by incorporating image-conditioned textual representations and rationale-aware decoding
  • Conducted extensive ablation studies on modality fusion strategies (early vs late fusion), prompt design, and rationale supervision, demonstrating the critical role of structured reasoning for complex science questions
  • Analyzed failure cases involving visual ambiguity and long-context scientific explanations, proposing future extensions using graph-based reasoning and external knowledge integration

Technologies

BLIP-2ViLTPyTorchTransformersPythonScienceQA

Platforms

GCPColumbia Insomnia Clusters

Parameter-Efficient Fine-Tuning of Transformer Models

COMS W6998: High Performance Machine Learning

Columbia University
Fall 2025

Conducted a comprehensive empirical study of parameter-efficient fine-tuning (PEFT) methods, comparing LoRA, QLoRA, and full fine-tuning on transformer-based NLP models under strict GPU memory constraints.

Key Achievements

  • Benchmarked fine-tuning strategies on standard NLP tasks, analyzing trade-offs across accuracy, GPU memory usage, training throughput, and convergence behavior
  • Demonstrated that LoRA achieves competitive accuracy while reducing trainable parameters by over 95% and lowering peak GPU memory consumption compared to full fine-tuning
  • Implemented QLoRA-based low-bit quantization pipelines, enabling fine-tuning of large models on limited hardware with minimal performance degradation
  • Performed system-level profiling using PyTorch and GPU monitoring tools to identify memory bottlenecks, kernel inefficiencies, and I/O overhead during training
  • Produced detailed scalability and efficiency analyses across LoRA rank configurations, highlighting accuracy–efficiency trade-offs critical for deployment in resource-constrained environments

Technologies

LoRAQLoRAPyTorchTransformersCUDAPython

Platforms

GCPColumbia Insomnia Clusters