Jun 306 minTokenizationByte-Level BPEByte-Level BPE: Unicode-agnostic tokenization. Handles any character and out-of-vocabulary words. Balances efficiency and representation.
Jun 304 minTokenizationSubword Regularization with BPEStochastic tokenization improving robustness. Applicable in BPE pre-training and fine-tuning.Balances consistency and variability.
Jun 304 minTokenizationDynamic BPEDynamic BPE: Adaptive tokenization for pre-training and fine-tuning. Balances flexibility and consistency.