Publications
Please also check our google scholar pages and arxiv for latest works
2024
- Less: Selecting influential data for targeted instruction tuning ICML 2024
- Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving NeurIPS 2024
- Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates NeurIPS 2024
- AI-Assisted Generation of Difficult Math Questions MATH-AI Workshop at NeurIPS 2024
- ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty NeurIPS 2024 Datasets Track
- Can Models Learn Skill Composition from Examples? NeurIPS 2024
- Progressive Distillation Induces an Implicit Curriculum M3L Workshop at NeurIPS 2024, Theoretical Foundations of Foundation Models Workshop at ICML 2024, MI Workshop at ICML 2024
2023
- Fine-Tuning Language Models with Just Forward Passes NeurIPS 2023
- A Theory for Emergence of Complex Skills in Language Models Preprint
- Task-Specific Skill Localization in Fine-tuned Language Models ICML 2023
- A Kernel-Based View of Language Model Fine-Tuning ICML 2023
- Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models ICLR 2024
- The Marginal Value of Momentum for Small Learning Rate SGD ICLR 2024
- Why (and When) does Local SGD Generalize Better than SGD? ICLR 2023
- Trainable Transformer in Transformer ICML 2024
- Unlearning via Sparse Representations Preprint
2022
- Understanding Contrastive Learning Requires Incorporating Inductive Biases ICML 2022
- Understanding Gradient Descent on the Edge of Stability in Deep Learning ICML 2022
- Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction NeurIPS 2022
- On the SDEs and Scaling Rules for Adaptive Gradient Algorithms NeurIPS 2022
- Understanding Influence Functions and Datamodels via Harmonic Analysis ICLR 2023
- Adaptive Gradient Methods with Local Guarantees Preprint
- New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound NeurIPS 2022
2021
- Evaluating Gradient Inversion Attacks and Defenses in Federated Learning NeurIPS 2021
- What Happens after SGD Reaches Zero Loss?--A Mathematical Framework NeurIPS 2021,
- Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias NeurIPS 2021
- On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) NeurIPS 2021
- On Predicting Generalization using GANs ICLR 2022 (spotlight)
- Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data Preprint
- Opening the Black Box of Deep Learning: Some Lessons and Take-aways SIGMETRICS '21
- Technical perspective: Why don't today's deep nets overfit to their training data? Communications of the ACM
2020
- InstaHide: Instance-hiding Schemes for Private Distributed Learning ICML 2020
- A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks ICLR 2021
- Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate NeurIPS 2020
- Provable Representation Learning for Imitation Learning via Bi-level Optimization ICML 2020
- Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets? ICLR 2021
- TextHide: Tackling Data Privacy in Language Understanding Tasks EMNLP 2020 (Findings)
- Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality NeurIPS 2020
- A Sample Complexity Separation between Non-Convex and Convex Meta-Learning ICML 2020
- Privacy-preserving Learning via Deep Net Pruning Preprint
- Theory of deep learning Manuscript
- The Quest for Mathematical Understanding of Deep Learning 40th IARCS Annual Conference on Foundations of Software Technology
2019
- Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks ICML 2019
- On Exact Computation with an Infinitely Wide Neural Net NeurIPS 2019
- Implicit Regularization in Deep Matrix Factorization NeurIPS 2019
- A Theoretical Analysis of Contrastive Unsupervised Representation Learning ICML 2019
- An Exponential Learning Rate Schedule for Deep Learning ICLR 2021
- Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks ICLR 2020
- Enhanced Convolutional Neural Tangent Kernels Preprint
- Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets NeurIPS 2019
- A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks ICLR 2019
- A Simple Saliency Method That Passes the Sanity Checks Preprint
- Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks ICLR 2020 (spotlight)