Namespace Unity.InferenceEngine.Tokenization.Mappers

Classes

BpeMapper

Turns a string input into a sequence of Token instances using the Byte-Pair Encoding strategy.

UnigramMapper

Implements a unigram-based tokenization mapper that converts text into tokens using a vocabulary-based approach. This mapper supports byte-level fallback for handling out-of-vocabulary characters.

WordLevelMapper

A word-level tokenization mapper that converts between tokens and their corresponding IDs.

WordPieceMapper

Turns an input string into a sequence of token ids using the Word Piece strategy.

Structs

BpeMapperOptions

Configuration settings for the Byte Pair Encoding (BPE) mapper used in tokenization.

Represents a mergeable pair of token values used in Byte Pair Encoding (BPE) tokenization. Each pair consists of two consecutive token strings that can be merged into a single token during the BPE encoding process. See BpeMapper.

UnigramVocabEntry

Represents a vocabulary entry for unigram tokenization, containing a token string and its associated score. This structure is used to store token-score pairs in the unigram vocabulary for tokenization algorithms.

Interfaces

IMapper

Turns an input string into a sequence of token ids. This is the Hugging Face equivalent of Models.