category
🧵 Table of Contents
- 🧵 Table of Contents
- 🚀 Leaderboard
- 💡 Evaluation Toolkit:
- 📚 Paper
- 🙌 Contributors
- Cite as
- Acknowledgement
- Star History
🚀 Leaderboard
Central Leaderboard (Sort by HumanEval Pass@1)
Model | Params | HumanEval | MBPP | HF | Source |
---|---|---|---|---|---|
GPT-4 + Reflexion | ? | 91.0 | 77.1 | paper | |
GPT-4 (latest) | ? | 84.1 | 80.0 | github | |
DeepSeek-Coder-Instruct | 33B | 79.3 | 70.0 | ckpt | github |
DeepSeek-Coder-Instruct | 7B | 78.6 | 65.4 | ckpt | github |
GPT-3.5-Turbo (latest) | ? | 76.2 | 70.8 | github | |
Code-Llama | 34B | 62.2 | 61.2 | paper | |
Pangu-Coder2 | 15B | 61.6 | paper | ||
WizardCoder-15B | 15B | 57.3 | 51.8 | ckpt | paper |
Code-Davinci-002 | ? | 47.0 | paper | ||
StarCoder-15B (Prompted) | 15B | 40.8 | 49.5 | ckpt | paper |
PaLM 2-S | ? | 37.6 | 50.0 | paper | |
PaLM-Coder-540B | 540B | 36.0 | 47.0 | paper | |
InstructCodeT5+ | 16B | 35.0 | paper | ||
StarCoder-15B | 15B | 33.6 | 52.7 | ckpt | paper |
Code-Cushman-001 | ? | 33.5 | 45.9 | paper | |
CodeT5+ | 16B | 30.9 | paper | ||
LLaMA2-70B | 70B | 29.9 | ckpt | paper | |
CodeGen-16B-Mono | 16B | 29.3 | 35.3 | paper | |
PaLM-540B | 540B | 26.2 | 36.8 | paper | |
LLaMA-65B | 65B | 23.7 | 37.7 | paper | |
CodeGeeX | 13B | 22.9 | 24.4 | paper | |
LLaMA-33B | 33B | 21.7 | 30.2 | paper | |
CodeGen-16B-Multi | 16B | 18.3 | 20.9 | paper | |
AlphaCode | 1.1B | 17.1 | paper |
Leaderboard | Access |
---|---|
Big Code Models Leaderboard | [Source] |
BIRD | [Source] |
CanAiCode Leaderboard | [Source] |
Coding LLMs Leaderboard | [Source] |
CRUXEval Leaderboard | [Source] |
EvalPlus | [Source] |
HumanEval.jl | [Source] |
InfiCoder-Eval | [Source] |
InterCode | [Source] |
Program Synthesis Models Leaderboard | [Source] |
Spider | [Source] |
💡 Evaluation Toolkit:
- bigcode-evaluation-harness: A framework for the evaluation of autoregressive code generation language models.
- code-eval: A framework for the evaluation of autoregressive code generation language models on HumanEval.
📚 Paper
▶️ Pre-Training
-
Evaluating Large Language Models Trained on Code
Preprint
[Paper] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto. et al. 2021.07
-
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
ICLR23
[Paper] Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. 2022.03
-
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
ACL23 (Findings)
[Paper][Repo] Yekun Chai, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, and Hua Wu. 2022.12
-
SantaCoder: don't reach for the stars!
Preprint
[Paper] Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff. et al. 2023.01
-
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Preprint
[Paper] Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, Jie Tang. 2023.03
-
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
ICLR23
[Paper] Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou. 2023.05
-
StarCoder: may the source be with you!
Preprint
[Paper] Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou. et al. 2023.05
-
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Preprint
[Paper] Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi. 2023.05
-
Textbooks Are All You Need
Preprint
[Paper] Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi. et al. 2023.06
-
Code Llama: Open Foundation Models for Code
Preprint
[Paper] Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat. et al. 2023.08
-
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Preprint
[Paper] Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen et al. 2024.01
-
StarCoder 2 and The Stack v2: The Next Generation
Preprint
[Paper] Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang et al. 2024.02
▶️ Instruction Tuning
-
Code Alpaca: An Instruction-following LLaMA Model trained on code generation instructions
[Repo] Sahil Chaudhary. 2023
-
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Preprint
[Paper] Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang. 2023.07
-
OctoPack: Instruction Tuning Code Large Language Models
Preprint
[Paper][Repo] Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre. 2023.08
-
Magicoder: Source Code Is All You Need
Preprint
[Paper][Repo] Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, Lingming Zhang 2023.12
▶️ Alignment with Feedback
-
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
NeurIPS22
[Paper] Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven C.H. Hoi. 2022.07
-
Execution-based Code Generation using Deep Reinforcement Learning
TMLR23
[Paper] Parshin Shojaee, Aneesh Jain, Sindhu Tipirneni, Chandan K. Reddy. 2023.01
-
RLTF: Reinforcement Learning from Unit Test Feedback
Preprint
[Paper] Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye. 2023.07
-
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
Preprint
[Paper] Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, Qianxiang Wang. 2023.07
▶️ Prompting
-
CodeT: Code Generation with Generated Tests
ICLR23
[Paper] Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen. 2022.07
-
Coder Reviewer Reranking for Code Generation
ICML23
[Paper] Tianyi Zhang, Tao Yu, Tatsunori B Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I Wang. 2022.11
-
LEVER: Learning to Verify Language-to-Code Generation with Execution
ICML23
[Paper] Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov, Wen-tau Yih, Sida I. Wang, Xi Victoria Lin. 2023.02
-
Teaching Large Language Models to Self-Debug
Preprint
[Paper] Xinyun Chen, Maxwell Lin, Nathanael Schärli, Denny Zhou. 2023.06
-
Demystifying GPT Self-Repair for Code Generation
Preprint
[Paper] Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, Armando Solar-Lezama. 2023.06
-
SelfEvolve: A Code Evolution Framework via Large Language Models
Preprint
[Paper] Shuyang Jiang, Yuhao Wang, Yu Wang. 2023.06
▶️ Evaluation & Benchmark
-
Measuring Coding Challenge Competence With APPS
NeurIPS21
Named APPS
[Paper][Repo] Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt. 2021.05
-
Program Synthesis with Large Language Models
Preprint
Named MBPP
[Paper] Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton. 2021.08
-
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
ICML23
[Paper] Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu. 2022.11
-
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
Preprint
[Paper] Tianyang Liu, Canwen Xu, Julian McAuley. 2023.06
-
Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
Preprint
[Paper] Li Zhong, Zilong Wang. 2023.08
-
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
EMNLP23
[Paper] Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, Weizhu Chen. 2023.10
-
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Neurips23
[Paper] Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan. et al. 2023.11
-
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
ICLR24
[Paper] YCarlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan. 2023.10
-
DevBench: A Comprehensive Benchmark for Software Development
Preprint
[Paper][Repo] Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, Lingming Zhang 2024.3
-
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
ICML23
[Paper] Daya Guo, Canwen Xu, Nan Duan, Jian Yin, Julian McAuley. 2023.10
-
Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing
Preprint
[Paper] Jiayi Wei, Greg Durrett, Isil Dillig. 2023.5
-
Automating Code Review Activities by Large-Scale Pre-training
Preprint
[Paper] JZhiyu Li, Shuai Lu, Daya Guo, Nan Duan, Shailesh Jannu, Grant Jenks, Deep Majumder, Jared Green, Alexey Svyatkovskiy, Shengyu Fu, Neel Sundaresan. 2022.10
▶️ Using LLMs while coding
-
Awesome-DevAI: A list of resources about using LLMs while building software
Awesome
[Repo] Ty Dunn, Nate Sesti. 2023.10
- 登录 发表评论