Sreeram Raghammudi

Projects

Multi-modal Scientific Question Answering

COMS W4705: Natural Language Processing

Columbia University

Fall 2025

Designed a multi-modal Scientific Question Answering (SQA) system leveraging the ScienceQA benchmark, integrating image, question text, context paragraphs, and multiple-choice reasoning.

Key Achievements

•Implemented and evaluated Vision-Language Transformer architectures (BLIP-2, ViLT) with chain-of-thought prompting to enable step-by-step scientific reasoning across visual and textual modalities
•Proposed a hybrid reasoning pipeline combining frozen vision encoders with instruction-tuned LLM backbones, improving interpretability while reducing training cost
•Achieved up to +6.3% accuracy improvement over unimodal baselines by incorporating image-conditioned textual representations and rationale-aware decoding
•Conducted extensive ablation studies on modality fusion strategies (early vs late fusion), prompt design, and rationale supervision, demonstrating the critical role of structured reasoning for complex science questions
•Analyzed failure cases involving visual ambiguity and long-context scientific explanations, proposing future extensions using graph-based reasoning and external knowledge integration

Technologies

BLIP-2ViLTPyTorchTransformersPythonScienceQA

Platforms

GCPColumbia Insomnia Clusters

Parameter-Efficient Fine-Tuning of Transformer Models

COMS W6998: High Performance Machine Learning

Columbia University

Fall 2025

Source Write-up

Conducted a comprehensive empirical study of parameter-efficient fine-tuning (PEFT) methods, comparing LoRA, QLoRA, and full fine-tuning on transformer-based NLP models under strict GPU memory constraints.

Key Achievements

•Benchmarked fine-tuning strategies on standard NLP tasks, analyzing trade-offs across accuracy, GPU memory usage, training throughput, and convergence behavior
•Demonstrated that LoRA achieves competitive accuracy while reducing trainable parameters by over 95% and lowering peak GPU memory consumption compared to full fine-tuning
•Implemented QLoRA-based low-bit quantization pipelines, enabling fine-tuning of large models on limited hardware with minimal performance degradation
•Performed system-level profiling using PyTorch and GPU monitoring tools to identify memory bottlenecks, kernel inefficiencies, and I/O overhead during training
•Produced detailed scalability and efficiency analyses across LoRA rank configurations, highlighting accuracy–efficiency trade-offs critical for deployment in resource-constrained environments

Technologies

LoRAQLoRAPyTorchTransformersCUDAPython

Platforms

GCPColumbia Insomnia Clusters