2024-12-26 Extracting SentencePiece tokeniser training settings for vocabulary expansion How to extract SentencePiece tokeniser training settings for vocabulary expansion.
2024-06-24 Vocabulary expansion for non-SentencePiece based BPE tokeniser A note on how to do vocabulary expansion with LLaMA3, OLMo, etc.