2026-05-30

ACL Paper Writing Tips

Tips
Opinion

Table of Contents

Introduction
1. What makes a good paper?
2. How to write a well-motivated paper?
3. How to write a novel paper?
4. How to write a well-executed paper?
5. How to write a well-written paper?
6. How to write a reproducible paper?
7. Frequently asked questions (FAQs)
Conclusion
Useful resources

Introduction

This post provides practical tips for writing papers for *ACL conferences, based on my experience as both a reviewer and an author in the NLP/ML communities.¹ I hope these insights are helpful for those who are new to writing for *ACL or looking to sharpen their paper-writing skills (though I am always learning myself, so take my advice as a peer perspective!).

1. What makes a good paper?

There must be many different opinions on this topic. In my opinion, a good paper should have the following characteristics:

Well-motivated: The paper should clearly explain the motivation behind the research. It should answer the question of why the research is important and what problem it is trying to solve.
Novel: The paper should present novel ideas or approaches that have not been explored before. It should contribute something new to the field of NLP/ML.
Well-executed: The paper should be well-executed, meaning that the experiments should be well-designed and the results should be clearly presented.
Well-written: The paper should be well-written, meaning that it should be easy to read and understand. The writing should be clear and concise, and the paper should be well-structured.
Reproducible: The paper should provide enough details for others to reproduce the results. This includes providing code, data, and any other necessary resources.

Of course, these are just my opinions, and there may be other characteristics that others may consider important for a good paper. However, I believe that these are generally accepted as important for a good paper in the NLP/ML community.

Let’s take a closer look at each of these characteristics in the following sections.

2. How to write a well-motivated paper?

A well-motivated paper should clearly explain the motivation behind the research. It should answer the question of why the research is important and what problem it is trying to solve.

I always try to have the following blocks in the Abstract and Introduction sections of my papers to make it well-motivated:

General background: This block provides a general background on the topic of the paper. It should explain the broader context of the research and why it is important.
Key challenge: This block should clearly state the key (broader) challenge that the paper is trying to address. It should explain what the problem is and why it is difficult to solve.
Current solution and its limitations: This block should describe the current solution to the problem and its limitations. It should explain why the current solution is not sufficient and what gaps exist in the current research.
Proposed solution (what we do to overcome the limitations): This block should describe the proposed solution to the problem. It should explain how the proposed solution overcomes the limitations of the current solution and what contributions it makes to the field.
How we verify the effectiveness of our solution: This block should describe how the effectiveness of the proposed solution is verified. It should explain the experiments that are conducted and the results that are obtained to demonstrate the effectiveness of the proposed solution.
Summary of contributions: This block should summarize the main contributions of the paper. It should clearly state what the paper contributes to the field and why it is important.

Here are the example sentences for each block in the Introduction section from one of my ACL papers: Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates.

General background

Large language models (LLMs) demonstrate remarkable generalization across numerous applications (OpenAI, 2025; Guo et al., 2025; Yang et al., 2025; Gemma Team et al., 2025). However, they notoriously underperform in languages absent or underrepresented in their training data, creating a barrier to equitable access for speakers worldwide (Huang et al., 2023). The standard approach to resolve this issue is continual pre-training (CPT) or fine-tuning on target language data (Cui et al., 2024; Ji et al., 2025).

In this block, we first introduce the general topic of the paper, which is the underperformance of LLMs in underrepresented languages. We then explain the standard approach to address this issue, which is CPT or fine-tuning on target language data. This sets the stage for the key challenge that we will discuss in the next block.

Key challenge

Yet, adapting instruct models to these languages is uniquely challenging. Such models require specialized instruction-tuning data (Wei et al., 2022; Rafailov et al., 2023), which is often unavailable or prohibitively costly to create for underrepresented languages (Huang et al., 2024c). Furthermore, machine-translated data as a low-cost alternative is not consistently effective (Tao et al., 2024).

What is important here is that we restrict our focus to a specific type of model (instruct models). We then explain the key, broader challenge, which is the lack of specialized instruction-tuning data for underrepresented languages. We also mention that machine-translated data is not consistently effective, which further highlights the challenge. This block clearly states the problem that we are trying to solve in the paper, and excludes other related problems (e.g., base model adaptation) that are not the focus of the paper. This helps to make the paper more focused and clear.

Current solution and its limitations

Consequently, using unlabeled target language text is often the only viable option for adaptation. While this approach can improve target language proficiency, it often triggers catastrophic forgetting (Kirkpatrick et al., 2017; Tejaswi et al., 2024; Mundra et al., 2024; Yamaguchi et al., 2025), where new training erases prior knowledge. This issue is acute for instruct models, as it cripples the general-purpose functionality of the model, which is primarily derived from core abilities like chat and instruction-following. In response, previous work has attempted post-hoc mitigation. For example Yamaguchi et al. (2025) merge weights of the original and adapted models, while Huang et al. (2024c) use a task vector and apply parameter changes from CPT on the base model to the instruct model. Nonetheless, these methods largely fail to mitigate catastrophic forgetting, substantially degrading these core functionalities.

The shortcomings of post-hoc methods suggest that mitigation should occur during adaptation. We therefore focus on the CPT stage. Specifically, we leverage selective parameter updates, a method of restricting which weights are modified during training. This approach is proven more effective at mitigating catastrophic forgetting than alternatives like parameter-efficient fine-tuning, regularization, or model merging (Zhang et al., 2024a; Hui et al., 2025). However, existing selective parameter tuning paradigms for adapting LLMs are ill-suited for adapting instruct models with unlabeled target language text. They rely either on random selection, offering no principled way to preserve knowledge, or on signals from the new data to guide update (target-focused) (§2). Target-focused signals are particularly vulnerable because raw text lacks chat templates required to elicit instruction-following behavior. Optimizing for this incompatible format risks corrupting the very foundational capabilities we aim to preserve due to the structural differences between raw text and chat templates.

Here, we first explain the current solution to the problem, which is using unlabeled target language text for adaptation. We then explain the limitations of this approach, which is that it often triggers catastrophic forgetting. We also mention previous work, i.e., current solutions, that has attempted post-hoc mitigation, and explain that these methods largely fail to mitigate catastrophic forgetting.

Given the shortcomings of post-hoc methods, we argue that mitigation should occur during adaptation. We then focus on the CPT stage and explain that selective parameter updates is a promising approach to mitigate catastrophic forgetting. However, we also explain that existing selective parameter tuning paradigms are ill-suited for adapting instruct models with unlabeled target language text, as they rely on random selection or target-focused signals, which are not effective in this context. This sets the stage for our proposed solution, which we will discuss in the next block.

Note: This block can oftentimes be a single paragraph. In the example above, I split it into two paragraphs to make it easier to read and understand. The first paragraph focuses on the limitations of the current solution, while the second paragraph focuses on the limitations of existing selective parameter tuning paradigms. This helps to make the paper more organized and easier to read.

Proposed solution (what we do to overcome the limitations)

We therefore introduce Source-Shielded Updates (SSU), a novel source-focused approach that proactively shields source knowledge before adaptation begins (Figure 1). First, SSU identifies parameters critical to source abilities using a small set of source data and a parameter importance scoring method, such as those used in model pruning, e.g., Wanda (Sun et al., 2024). Second, it uses these element-wise scores to construct a column-wise freezing mask. This structural design is crucial. Unlike naive element-wise freezing that corrupts feature transformations, our column-wise approach preserves them entirely. Finally, this mask is applied during CPT on unlabeled target language data, keeping the shielded structural units frozen. This process allows SSU to effectively preserve the general-purpose ability of the model while improving target language performance.

Here, we briefly describe our proposed solution. We need to clearly explain how our proposed solution overcomes the limitations discussed in the previous block, while also providing enough information on how our proposed solution works. Moreover, this paragraph often comes with a figure that visually explains the proposed solution, making it easier for reviewers and readers to understand the method.

How we verify the effectiveness of our solution

We verify our approach through extensive experiments with five typologically diverse languages and two different model scales (7B and 13B). We evaluate source language (English) performance across dimensions including chat, instruction-following, safety, and general generation and classification, alongside target language performance.

This block should briefly describe how we verify the effectiveness of our proposed solution. Some papers may also choose to include key findings in this block. However, I personally prefer to keep it brief and save the details for the Results section of the paper, as we mention some of the key findings in the next block and we want to avoid redundancy.

Summary of contributions

We summarize our contributions as follows:

A novel method for adapting instruct models to a target language without specialized target instruction-tuning data, addressing a key bottleneck to expand linguistic accessibility.

At two model scales, SSU consistently outperforms all baselines on all core instruction-following and safety tasks. It achieves leading target-language proficiency rivaling full fine-tuning while almost perfectly preserving general source-language performance.

Extensive analysis validates the efficacy of SSU, confirming the superiority of column-wise freezing and the importance of source data-driven parameter scoring. Qualitatively, we show that SSU avoids the linguistic code-mixing that state-of-the-art methods suffer from, explaining its superior abilities across source chat and instruction-following tasks.

This block should summarize the main contributions of the paper. Typically, the first contribution is the proposed method (resource), the second contribution is the main results, and the third contribution is the analysis. However, this is just a common pattern, and there are always variations. The key is to clearly state what the paper contributes to the field.

3. How to write a novel paper?

Novelty is a crucial aspect of a good paper. However, I think this is also quite a subjective aspect, and there may be different opinions on what constitutes novelty. In my opinion, a paper can be considered novel if it presents new ideas or approaches that have not been explored before. This could be a new method, a new application of an existing method, a new dataset, or a new analysis. That is, you do not need to specifically focus on “novelty” when writing a paper if your paper can tick the other boxes (e.g., well-motivated, well-executed, well-written, and reproducible).

4. How to write a well-executed paper?

Do not use old models.
Using old models can make the paper look outdated and may not be well-received by reviewers. It is important to use the latest models in the field to demonstrate the relevance and novelty of the research. If you have a good reason to use an old model, make sure to clearly explain why you are using it. Personally, I do not like to see reasons such as “we use A (an old model) as previous work does” or “we use A (an old model) for fair comparison with previous work”. As NLP is a practical and fast-moving field, it is important to use the latest models to demonstrate the practical relevance of the research. If you want to compare with previous work that uses an old model, you can still do so by including the results of the old model in the paper, but it is important to also include the results of the latest models to show the relevance and novelty of your research.
Use multiple model families and multiple model scales where applicable.
Unfortunately, different models often behave inconsistently. Even within the same family, varying the model scale can result in entirely different trends. Therefore, it is crucial to evaluate multiple model families and scales to demonstrate the robustness of any findings. As a reviewer, I frequently encounter the author argument, “we do not have enough compute,” which is a weak justification. If computational resources are limited, authors should at least evaluate their best-performing approach against the top baseline across multiple model families and scales. This targeted approach can still provide compelling evidence for the robustness of the findings.
Use strong/latest baselines.
It is important to use strong and/or the latest baselines to demonstrate the effectiveness of your proposed method. Using weak or outdated baselines can make the paper look unprofessional and may not be well-received by reviewers. If you do need to use a weak or outdated baseline for some reason, make sure to briefly explain why you are using it.
Follow the best practices for experimental design.
When designing experiments, it is important to follow the best practices to ensure that the results are reliable and valid. This includes using appropriate evaluation benchmarks and metrics. For generation tasks, it is important to run multiple evaluation runs with different random seeds and report the average results.
Conduct comprehensive ablation studies.
If your proposed method has multiple components or hyperparameters, it is important to conduct comprehensive ablation studies to understand the contribution of each component or hyperparameter to the overall performance of the method.

5. How to write a well-written paper?

There are many tips for writing a well-written paper, but I will focus on the following: Follow the rules, Clarity, and Conciseness, which I think are particularly important for writing papers for *ACL conferences.

Follow the (unwritten) rules of writing in many ways

Question: Why do we need to follow the (unwritten) rules of writing?

Answer:

It makes the paper look professional.
A typical "format" exists because people think it makes a paper easier to understand. In other words, deviations from the "format" increase the cognitive load of reviewers (and readers in general), making the paper harder to read.
Reviewers can focus on the content rather than the writing style, which avoids distractions.

Follow the formatting guidelines of the conference.
This may seem obvious, but it is important to follow the formatting guidelines of the conference. This includes using the correct template, font size, line spacing, and so on. Not following the formatting guidelines can make the paper look unprofessional and may even lead to desk rejection. For ML people writing *ACL papers, do not place captions above tables. All captions (for tables and figures) should be placed below the tables and figures for *ACL papers. Also, do not use space tweaking commands (e.g., \vspace, \hspace) to adjust the spacing in the paper, as it can make the paper look unprofessional and may not be well-received by reviewers.²
Follow the structure of a typical research paper.
A typical research paper has the following structure: Abstract, Introduction, Related Work, Methodology, Experimental Setup, Results, Analysis, Conclusion, Limitations, and Ethical Considerations. Following this structure can help to make the paper more organized and easier to read. Of course, there may be some variations in the structure depending on the specific topic of the paper, but it is generally a good idea to follow this structure as much as possible.

Within each section, it is also important to follow the “typical” structure of that section. For example, the Introduction section should follow what I described in the “How to write a well-motivated paper?” section above.
Do exactly what you mention in the Abstract and Introduction sections in the rest of the paper.
Often, authors mention several things in the Abstract and Introduction sections that they do not actually do in the rest of the paper. This severely damages the credibility of the paper and can lead to a negative review.
A related work section should briefly and clearly explain how the paper is different from or related to the most relevant work.
Do not just list related work without explaining how the paper is different from or related to the most relevant work. This can make it difficult for reviewers and readers to understand the novelty and significance of the paper.
A methodology section should clearly explain the proposed method in a build-up manner.
Your methodology section should be organized in a build-up manner. It should start with an overview of the method, followed by detailed explanations of each component or step of the method. Moreover, if your proposed method has multiple variants, it is often a good idea to first explain the basic version of the method, and then explain the variants in a separate subsection or named paragraphs with clear, brief motivations for each variant. This can help to make the methodology section more organized and easier to understand.
Write a detailed Experimental Setup section. Sometimes, I see papers that have a very vague or very short Experimental Setup section. In my opinion, this is a big mistake. The Experimental Setup section is very important in many ways.
- It defines the layout of the Results section. For instance, if you mention that you verify the effectiveness of your proposed solution using (i) Instruction-following and Chat performance, (ii) Safety performance, (iii) Source language performance, and (iv) Target language performance in the Experimental Setup section, then you should have the corresponding subsections or named paragraphs in the Results section of the paper. This makes the Results section more organized and easier to read.
- It helps you to justify the experiments that you include in the paper.
  The Experimental Setup section is there to not only introduce the experimental details but also to justify the experiments that you include in the paper. That is, you can explain why your experimental setup is appropriate for verifying the effectiveness of your proposed solution. For example, you can explain why you choose to evaluate on certain tasks, certain languages, certain model scales, etc. This can help to make the paper more convincing and can also help to address potential criticisms from reviewers.
Make a bullet-point list before writing the Results section.
Before writing the Results section, it can be helpful to make a bullet-point list of the key aspects that you want to highlight in this section. This could be “results by model, results by task, results by language, results by model scale, etc.”. If a paper has specific research questions, then the bullet-point list could be “results for RQ1, results for RQ2, etc.”.
Provide a brief motivation for each analysis.
Readers will probably not remember what you have mentioned in the related work and methodology sections, etc. related to your method or framework. Therefore, it is important to briefly mention what is to be investigated in each analysis before going straight into mentioning “Table X shows that …” or “Figure Y shows that …”.

Clarity

Use simple and clear language.
Most *ACL reviewers and readers are not native English speakers, so it is important to use simple and clear language to make the paper easier to read and understand.
Avoid jargon and complex sentences.
Jargon and complex sentences can make the paper harder to read and understand. When a reviewer is not very familiar with the specific topic of the paper but is an expert in the broader NLP/ML field, they may have a hard time understanding the paper if it is filled with jargon and complex sentences. Therefore, it is important to avoid jargon and complex sentences as much as possible, especially in the Abstract, Introduction, and Conclusion sections of the paper.
Make sure not to use abbreviations without defining them first.
Personally, I also recommend not using too many abbreviations in the paper, as it can make the paper harder to read and understand. Human working memory is limited, and it can be difficult for readers to keep track of too many abbreviations.
Do not rely on Appendices to explain important details.
Appendices should only be used for additional experimental details, extended literature reviews, and supplementary results that are not crucial for understanding the main message of the paper. Most importantly, reviewers are not required to read the Appendices. Therefore, if you put important details in the Appendices, you should not expect reviewers to read them, and this can lead to a negative review.

Conciseness

Be concise and to the point.
Do not write a lengthy paragraph. People usually do not have the patience to read a long paragraph. If you were a reviewer, would you like to read a long paragraph? I personally would not. Therefore, it is important to be concise and to the point in your writing. If you have a lot to say, try to break it down into shorter paragraphs. Each paragraph should have a clear single main point and should not be too long.
Do not write a long sentence.
The same applies to sentences. Do not write a long sentence. If you have a long sentence, try to break it down into shorter sentences. If a sentence has conjunctions (e.g., “and”, “but”, “or”, etc.) as well as relative clauses (e.g., “which”, “that”, etc.) at the same time, it is often a sign that the sentence is too long and should be broken down into shorter sentences. Long sentences can make the paper harder to read and understand.

6. How to write a reproducible paper?

Use a reproducible tool like apptainer/docker for experiments.
NVIDIA, AMD, vllm, and PyTorch all provide pre-built docker images for LLM experiments. Using these images can ensure that the experiments are reproducible and can be easily run by others. Also, this can help others not to fall into so-called “dependency hell” when trying to reproduce the results of the paper.
Include software and hardware versions in the paper.
Even if you use a reproducible tool like apptainer/docker, it is still important to include the software and hardware versions in the paper. This can help others to understand the environment in which the experiments were conducted and can also help to identify any potential issues that may arise when trying to reproduce the results. For instance, when you use PyTorch and cite it in the paper, you should cite it as \citep[vX.X.X]{pytorch}, where vX.X.X is the version of PyTorch that you used in your experiments.
Include tables of hyperparameters.
If you run experiments, always include tables of hyperparameters in the paper.
Write a detailed README file for the code repository.
A detailed README file is crucial for ensuring that others can reproduce the results of the paper. The README file should include instructions on how to set up the environment, how to run the code, and how to evaluate the results.
Include the code for all the experiments that are included in the paper.
I sometimes see papers that only include the code for the proposed method, but not the code for the baselines that are compared with in the paper. This can make it difficult for others to reproduce the results of the paper, as they may not have access to the code for the baselines. If you do not have any concrete reason for not including the code for the baselines, you should include it in the repository and provide instructions on how to run it in the README file. This can help to make the paper more reproducible and can also help to increase the credibility of the paper.
Use relative paths or environment variables to specify paths in the code.
Do not hard-code any paths in the code, as it can make it difficult for others to reproduce the results.
Use git for version control and git submodules to manage dependencies for external code.
Using git for version control can help to keep track of changes in the code and can also help to collaborate with others. Using git submodules can help to manage dependencies for external code, such as code for baselines that are compared with in the paper. This can make it easier for others to reproduce the results of the paper, as they can easily clone the repository and its submodules.

7. Frequently asked questions (FAQs)

When should I start writing the paper?
I recommend starting to write the paper as early as possible.
What should I do if I have many things to say but the page limit is short?
Oftentimes, most of such things might not be important for the main message of the paper. A paper should only address one main message/issue, and all the content in the paper should be relevant to that main message/issue.
Should I include all the experiments that I have done in the paper?
No, you should only include the experiments that are relevant to the main message of the paper.
Why does my paper keep getting rejected?
Do not get discouraged by rejections. It is a common experience for many researchers. See my CV of failures for example: CV of Failures. However, it is important to learn from the feedback provided by reviewers and to keep improving the paper. It can also be helpful to get feedback from colleagues or mentors before submitting the paper to a conference.

Conclusion

Writing a good paper for *ACL conferences can be challenging, but it is also a rewarding experience. By following the tips and guidelines provided in this post, I hope you can improve your chances of writing a successful paper that is well-received by reviewers and readers.

Always remember that we write papers to communicate our research to others. Do not assume that reviewers and readers will understand your paper just because you understand it. Reviewers and readers cannot read your mind, so it is important to make sure everything is clearly and explicitly stated in the paper. Also, do not write in a way that you do not want to read as a reviewer or reader. If you do not want to read a long paragraph, do not write a long paragraph. Always try to put yourself in the shoes of the reviewers and readers when writing the paper.

Useful resources

My first first-author paper for an *ACL conference was accepted in 2021, and I have been a reviewer for *ACL conferences since 2023. ↩
If you need more space, use GenAI to remove redundant words or sentences, or to rewrite some sentences in a more concise way. ↩