Atsuki Yamaguchi

English | 日本語

About

I am a third-year PhD candidate in the Sheffield NLP Group, focusing on cross-lingual transfer and efficient language modeling. I have 7+ years of combined academic and industry experience in NLP and hold an MSc in Computer Science with Speech and Language Processing (Distinction) from the University of Sheffield (2020). From 2021 to 2023, I was a researcher in the R&D Group at Hitachi, Ltd. (Japan), working on information extraction and efficient language model development. My work includes first-author publications at EMNLP, ACL, EACL, CL, and TMLR, as well as co-authored papers at NeurIPS and ICML. On top of the research activities, I also contribute to the community as a member of the ACL Rolling Review (ARR) support team.

My curriculum vitae (CV) is available here (updated October 24, 2025). A CV of failures is available here, documenting unsuccessful applications and rejections.

News

Oct 24, 2025
Our paper on extremely low-resource vocabulary expansion has been accepted to Computational Linguistics! See here for the preprint.
Sep 15, 2025
Our paper on vocabulary expansion of chat models has been accepted to Transactions on Machine Learning Research (TMLR)! See here.
Sep 1, 2025
I will be co-chairing the EACL 2026 Student Research Workshop. Call for papers will be out soon!
Sep 1, 2025
Updated CV. See here.

Research Interests

Cross-lingual Transfer, Language Modelling, Natural Language Understanding, Natural Language Processing, Machine Learning

Recent Work

Towards efficient and fair natural language processing across languages
The aim of this research is to extend the applicability of large language models (LLMs) to languages beyond English, making AI technologies more accessible to a global audience. My work focuses on efficient cross-lingual adaptation and vocabulary expansion techniques that improve LLM performance across diverse languages.
- Atsuki Yamaguchi, Terufumi Morishita, Aline Villavicencio and Nikolaos Aletras, “Adapting Chat Language Models Using Only Target Unlabeled Language Data,” Transactions on Machine Learning Research (TMLR), September 2025.
- Atsuki Yamaguchi, Aline Villavicencio and Nikolaos Aletras, “How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text?,” Computational Linguistics, October 2025.
- Atsuki Yamaguchi, Aline Villavicencio and Nikolaos Aletras, “An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference,” Findings of the Association for Computational Linguistics: EMNLP 2024, November 2024.
Explore simple pretraining alternatives for Transformer-based language representation models
The aim of this research is to investigate simple yet effective pretraining objectives.
- Atsuki Yamaguchi, Hiroaki Ozaki, Terufumi Morishita, Gaku Morio and Yasuhiro Sogawa, “How does the task complexity of masked pretraining objectives affect downstream performance?,” Findings of the Association for Computational Linguistics: ACL 2023, July 2023. (Short paper)
- Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina and Nikolaos Aletras, “Frustratingly Simple Pretraining Alternatives to Masked Language Modeling,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Online, November 2021. (Short paper)

Contact

You can send me messages via this contact form.

Updated on October 24, 2025