Projects

Projects/Streams

Here is my GitHub and a YouTube playlist describing some of my projects.

Lean Agents (demo/repo)

Implement a small multi-agent system where each "agent" works on sub-lemmas of a theorem in Lean 4. Inspired by Michael Polanyi's notion of a spontaneous, decentralized Republic of Science, the agents collaborate implicitly: whenever one solves a lemma, it publishes the result so others can build on it.

Eval framework

Built LLM testing infrastructure for sea.dev. The evals test the ability of LLMs to extract information from multi-turn dialogue, assess tone and user-friendliness.

Golden Hour Prediction (work in progress)

Use a vision-language model (VLM) to predict sunrise/sunset sky colors by prompting the VLM with weather data and a skyline image, focusing on accurate atmospheric hue prediction.

Sherlock Holmes Eval (repo, video)

Inspired by an episode of the Dwarkesh Podcast, this is an eval that tests LLMs ability to determine the culprit in Sherlock Holmes murder mysteries.

Training algorithm for regularized models on arbitrarily large data sets (video/arXiv/repo)
Eliminates requirement of storing O(n) quantities in memory, and allows for training data to be stored in distinct sites for privacy concerns.
Debiased high-dimensional logistic regression (video/arXiv/repo)
Conjecture to debias model in setting where both the number of features and observations are asymptotically increasing.
AI-powered flashcard generator and spaced repetition study tool (repo)

This is a web app which uses a Django backend, React frontend, and the OpenAI API that allows users to upload documents and be turned into flashcards which can be studied using spaced repetition memory systems.

Europe Citadel Datathon 2021 (repo)

This was a team event hosted by Citadel. The task was to find new insights about how Covid-19 spreads. We placed second, here is the final report