Top NLP Papers – pt 2

Zeyneb K
Jan 21
3 min read

Updated: Mar 3

In part 1, I shared some my readings of some of the top papers on large language models in the field of NLP. I extensively examined the different purposes, methods, and findings, and then identified the key originality and contributions of the work. In the original post, I mostly focused on summarizing individual papers; here, I look at the trends as a whole and provide some of my own insights and realizations.

Research Avenues

Across the papers, I identified several main objectives that the authors aimed to address. These generally fell into advancing one of generalizability, data/compute efficiency, bias/risks, or understanding. Works targeting generalizability aimed to improve quality across different domains and tasks; those focused on data/compute efficiency examined the scaling of language models and the limits of their abilities at small/large resource levels; addressing bias/risks involved mitigating the societal, ethical, and privacy harms of language models; papers investigating our understanding of language models performed analyses on their abilities and worked on interpretability. These four high-level categories seemed to capture much of the paths of research that make up the NLP landscape at the moment.

With the various stages of developing language models, researchers attempted to achieve their objectives through different approaches. These included: curating data, where datasets were developed to target different goals; manipulating model inputs/outputs, such as with new prompting, preprocessing, decoding, or input representation methods; training methods, where different architectures and learning methods are presented; and analysis methods.

I realized that the originality of a paper often came from its unique combination of aims and methods.

Some considerations that are presented across the pieces is an increasing need for acknowledging the limitations of research--including with regards to technical and scope limitations, societal and ethical implications, and performance gaps. Moving forward, best practices in research call for in-depth documentation, including examination and measurement of potential harmful risks as well as considerations with regards open-source release and replicability.

Venues & Contributors

Across the papers, the top publication venues represented included ACL and other affiliated venues (such as EMNLP, EACL, and TACL), NeurIPS, and ICLR. A notable portion of the papers, despite considerable citation counts and influence in the field, were unpublished in a conference or journal and were often released in the form of industry reports.

With the exception of the dramatically influential and transformative (no pun intended) Attention is All You Need from Vaswani et al. (2017) and BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding from Devlin et al. (2018) with over 60-70K citations each, most papers had less than 10K; the mean and median citation count excluding the aforementioned two were both near 1K.

The main affiliations of researchers producing top NLP papers, in order of the number of papers on the list with email-domain-based affiliations included in the paper, are Google + DeepMind, Meta, OpenAI, and Microsoft in industry and computer science departments across the University of Washington, Carnegie Mellon University, and Stanford University in academia. Some notable names that appear repeatedly across some of these top most cited papers include Alec Radford, Colin Raffel, and Noam Shazeer, authoring five papers each on the list.

Examining some of these biggest papers in the field of NLP allowed me to gain an interesting perspective into the largest challenges in the field and the developments I see with the most potential going forward.