Zeyneb N. Kaya

Hi! I am Zeyneb, a student at Stanford University. I work on understanding and pushing the limits of AI, exploring robustness, learning from data, and the reasoning capabilities of algorithms beyond shallow pattern memorization.

I’m always eager to discuss interesting ideas and opportunities—please reach out!

zeynebnk [at] stanford [dot] edu

github / x / linkedin / writing

Research.

My work aims to advance our understanding of language models and their capabilities, and use that to improve them and push their limits in their fundamental challenges. I'm interested in robustness, data efficiency, and generalizability in distribution shifts, working in machine learning, algorithms, and statistics.

Listed below are selected relevant publications.

The Unified Cognitive Consciousness Theory for Language Models: Semantic Anchoring, Threshold Activation, and Emergent Intelligence

Edward Y. Chang, Zeyneb N. Kaya

Under Review

Vector Space Distance as a Measurement of Word Embedding Variability in Low-Resource Linguistic Environments

Annie K. Lamar, Zeyneb N. Kaya, Nichole M. Nomura

Under Review

Measuring the Impact of Data Augmentation Methods for Extremely Low-Resource NMT

Zeyneb N. Kaya, Annie K. Lamar

Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT) @ EACL, 2023

MADLIBS: A Novel Multilingual Data Augmentation Algorithm for Low-Resource Neural Machine Translation

Zeyneb N. Kaya

Regeneron Science Talent Search, 2024 & National Junior Science and Humanities Symposium, 2023

Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions

Zeyneb N. Kaya, Souvick Ghosh

arXiv preprint

Full Scope Word Embedding Variability for Low-Resource Languages

Zeyneb N. Kaya, Annie K. Lamar

IEEE MIT Undergraduate Research and Technology Conference, 2023

The Pervasiveness of Language Contact: Evidence from Negative Existentials in Romeyka/Turkish Code-Switching

Zeyneb N. Kaya
Proceedings of the Linguistic Society of America (PLSA), 2023

Women in the Workplace: Analyzing Gender Biases in Corporate Email Communications

Zeyneb N. Kaya
International Conference on Computational Social Science (IC2S2), 2023

What You Say Is What You Think: An Analysis Of Intellectual Humility In Online Discussion Forums
Zeyneb N. Kaya, Manya Sriram
University of California, Santa Barbara, 2022

An Artificial Intelligence Model on Rheumatology: Interpretation of the Sacroiliac Joint Graphy in Ankylosing Spondylitis

Ahmet C. Genc, Zeyneb N. Kaya, et al
Annals of the Rheumatic Diseases, 2021

Awards & Recognition.

Etched x Mercor x Cognition Hackathon – 1st Place/$40K Winner 2025
Regeneron Science Talent Search Winner – 5th Place/$90K Winner 2024
Coca Cola Scholar – 2024

PearVC x Anthropic Hackathon – 1st Place/Most Technical Winner, 2025

TreeHacks Scrapybara Prize – 1st Place/$16K-valued Winner, 2025

National Junior Science and Humanities Symposium (NJSHS) – National Honorable Mention, Regional 2nd Math/CS, 2023

Congressional App Challenge – 1st Place Winner, 2021

Olympiad in Linguistics (Online) – 10th Place / 1st in USA, 2023

North American Computational Linguistics Olympiad (NACLO) – Finalist / Invitational Round Qualifier, 2023

International Olympiad in Artificial Intelligence (IOAI) – Team USA invited representative (did not attend due to conflicts)

NCWIT Aspirations in Computing – National Award Winner + Regional Affiliate, 2023

Synopsys Science Fair – 1st Award + CSEF Qualifier (did not attend due to conflicts), 2023

Stanford Women in Data Science (WiDS) Datathon – HS Winner, 2023

RISE Challenge – Finalist, 2023

Technovation Global – Semifinalist, 2021

US Presidential Scholars – Semifinalist, 2024

National Merit Scholarship – Finalist, 2024

Scholastic Art and Writing Competition – Honorable Mention, 2020

USACO – Silver, 2020

Bausch and Lomb Honorary Science Award, University of Rochester – 2023

Yale YES Scholar + Hanh Scholar – 2024

Columbia Egleston Scholar – 2024

Cornell Hunter R. Rawlings III Presidential Research Scholar – 2024

UC Berkeley EECS Regent’s and Chancellors Scholar – 2024

Education.

Stanford University

Computer Science (AI)

/ Electrical Engineering and Mathematics

ASES (Affiliated Stanford Entrepreneurial Students) Bootcamp Scholar + 2nd Place Winner

Relevant Coursework: Deep NLP; Deep RL; Probability & Stochastic Differential Equations; AI & Language; AI for Reasoning; Statistical Mechanics for Computation and Learning

Saratoga High School

AI Club Co-President
Linguistics Club Founder + President
Chinese Club Events Coordinator

West Valley Community College

Dual Enrollment: Differential Equations, Linear Algebra, Multivariable Calculus, Cultural Anthropology

Experience.

TipTop

Co-founder CTO 2025

Efficient physics-informed AI for accelerated topological optimization in engineering design workflows. Backed by Y Combinator.

Dria

Researcher 2025

Decentralized LLM infra. Synthetic data. Mid/pre-training data optimization. Guided generation. Privacy-preserving inference.

Stanford Artificial

Intelligence Laboratory
Researcher 2025

Reinforcement learning for reasoning with diffusion large language models. Scaling and efficiency.

Stanford University /

Low-Resource Languages Lab

Researcher 2021 – 2024

Low-data natural language processing. Quantifying the advantage of synthetic data. Benchmarking embedding spaces' semantic validity and variability.

Intelligent Conversational Agents

& Neural Networks Lab, SJSU

Researcher 2021 – 2024

Multimodal architectures for dynamic conversational agents.

Projects.

MADLIBS

Designed MADLIBS (Multilingual Augmentation of Data with Alignment-Based Substitution), an efficient multilingual synthetic data generation algorithm achieving SOTA performance with less data.

@ Regeneron Science Talent Search 2024

LLaDA-R1

Created LLaDA-R1, a diffusion LLM optimized for reasoning and efficiency at inference time with SFT+RL for dynamic diffusion step adaptation and remasking refinement.

@ Mercor x Etched x Cognition Inference-Time Compute Hackathon 2025

SHIELD.

Built SHIELD., a multi-agent RL + tool use framework for automatic identification and remediation of system vulnerabilities.

@ Pear VC x Anthropic Hackathon 2025

In-Context Learning of Transformers: A Statistical Mechanics Lens

Investigated statistical physics models explaining in-context learning; applying spin glasses, random matrix theory, and phase transitions towards transformer interpretability.

@ APPPHYS 229 2025

Linguistic Reasoning: Dissociating Language & Logic

Developed parallel symbol tuning, an approach to improve in-context linguistic reasoning capabilities of LLMs for few-shot language learning.

@ CS 224N 2025

Language Models (can be) Few-Shot Fakers

Investigated CoT faithfulness & the role of memorization; Implemented corrupted CoT RL approach.

@ Anthropic Alignment Research Hackathon 2025

NeuroPilot

Built brain-computer-interface and agentic AI system for brain-powered natural language commands for hands-free computer control.

@ TreeHacks 2025