About

I am a third-year PhD candidate in the Sheffield NLP Group, focusing on cross-lingual transfer and efficient language modeling. I have 7+ years of combined academic and industry experience in NLP and hold an MSc in Computer Science with Speech and Language Processing (Distinction) from the University of Sheffield (2020). From 2021 to 2023, I was a researcher in the R&D Group at Hitachi, Ltd. (Japan), working on information extraction and efficient language model development. My work includes first-author publications at EMNLP, ACL, EACL, and TMLR, as well as co-authored papers at NeurIPS and ICML. On top of the research activities, I also contribute to the community as a member of the ACL Rolling Review (ARR) support team.
My curriculum vitae (CV) is available here (updated September 15, 2025). A CV of failures is available here, documenting unsuccessful applications and rejections.
News
Research Interests
Cross-lingual Transfer, Language Modelling, Natural Language Understanding, Natural Language Processing, Machine Learning
Recent Work
-
Towards efficient and fair natural language processing across languages
The aim of this research is to extend the applicability of large language models (LLMs) to languages beyond English, making AI technologies more accessible to a global audience. My work focuses on efficient cross-lingual adaptation and vocabulary expansion techniques that improve LLM performance across diverse languages.- Atsuki Yamaguchi, Terufumi Morishita, Aline Villavicencio and Nikolaos Aletras, “Adapting Chat Language Models Using Only Target Unlabeled Language Data,” Transactions on Machine Learning Research (TMLR), September 2025.
- Atsuki Yamaguchi, Aline Villavicencio and Nikolaos Aletras, “How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text?,” arXiv preprint: 2406.11477, June 2024. (Last updated in September 2024)
- Atsuki Yamaguchi, Aline Villavicencio and Nikolaos Aletras, “An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference,” Findings of the Association for Computational Linguistics: EMNLP 2024, November 2024.
-
Explore simple pretraining alternatives for Transformer-based language representaion models
The aim of this research is to investigate simple yet effective pretraining objectives.- Atsuki Yamaguchi, Hiroaki Ozaki, Terufumi Morishita, Gaku Morio and Yasuhiro Sogawa, “How does the task complexity of masked pretraining objectives affect downstream performance?,” Findings of the Association for Computational Linguistics: ACL 2023, July 2023. (Short paper)
- Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina and Nikolaos Aletras, “Frustratingly Simple Pretraining Alternatives to Masked Language Modeling,” The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), Online, November 2021. (Short paper)
Links
Contact
You can send me messages via this contact form.