top of page
Jun 30, 20244 min read
Dynamic BPE
Dynamic BPE: Adaptive tokenization for pre-training and fine-tuning. Balances flexibility and consistency.
Jun 28, 202411 min read
WordPiece Tokenization: A BPE Variant
Word Piece Tokenization: Subword segmentation for NLP. Builds vocab from frequent subwords & handles rare words
Jun 27, 20247 min read
Byte Pair Encoding: Cracking the Subword Code
Byte Pair Tokenization: Efficient subword segmentation. Merges frequent character pairs, handles unseen words, scales to sentences.
bottom of page