Jun 304 minTokenizationDynamic BPEDynamic BPE: Adaptive tokenization for pre-training and fine-tuning. Balances flexibility and consistency.
Jun 2811 minTokenizationWordPiece Tokenization: A BPE VariantWord Piece Tokenization: Subword segmentation for NLP. Builds vocab from frequent subwords & handles rare words
Jun 287 minTokenizationByte Pair Encoding: Cracking the Subword CodeByte Pair Tokenization: Efficient subword segmentation. Merges frequent character pairs, handles unseen words, scales to sentences.