Namespace Unity.InferenceEngine.Tokenization.Mappers
Classes
BpeMapper
Turns a string input into a sequence of Token instances using the Byte-Pair Encoding strategy.
UnigramMapper
Implements a unigram-based tokenization mapper that converts text into tokens using a vocabulary-based approach. This mapper supports byte-level fallback for handling out-of-vocabulary characters.
WordLevelMapper
A word-level tokenization mapper that converts between tokens and their corresponding IDs.
WordPieceMapper
Turns an input string into a sequence of token ids using the Word Piece strategy.
Structs
BpeMapperOptions
Configuration settings for the Byte Pair Encoding (BPE) mapper used in tokenization.
MergePair
Represents a mergeable pair of token values used in Byte Pair Encoding (BPE) tokenization. Each pair consists of two consecutive token strings that can be merged into a single token during the BPE encoding process. See BpeMapper.
UnigramVocabEntry
Represents a vocabulary entry for unigram tokenization, containing a token string and its associated score. This structure is used to store token-score pairs in the unigram vocabulary for tokenization algorithms.
Interfaces
IMapper
Turns an input string into a sequence of token ids. This is the Hugging Face equivalent of Models.