Namespace Unity.InferenceEngine.Tokenization.Decoders
Classes
BpeDecoder
Decoder for Byte Pair Encoding (BPE) tokens that removes suffix markers and restores spaces between words.
ByteFallbackDecoder
Converts tokens looking like "<0x61>" to character, and attempts to concatenate them into a string. If the tokens cannot be decoded, '�' is used instead for each inconvertible byte token.
ByteLevelDecoder
Converts byte-level characters to unicode characters, then concat them into a single string.
DefaultDecoder
Default decoder. Does not change the input chunks.
FuseDecoder
Fuse Decoder combine the tokens in list into a single large token.
MetaspaceDecoder
Decoder for "metaspace" tokenization, where spaces are represented by a special visible character (by default, U+2581 "▁").
RegexReplaceDecoder
Replaces a regex pattern from the tokens in the list.
ReplaceDecoder
Replaces a string pattern from the tokens in the list.
SequenceDecoder
A composite decoder that applies multiple decoders in sequence to process tokens. Each decoder in the sequence processes the output from the previous decoder, creating a pipeline for token transformation.
StripDecoder
Strip Decoder removes certain char from the substring of the token in the list.
WordPieceDecoder
An implementation of the WordPiece decoding algorithm.
Interfaces
IDecoder
Applies modifications to the input detokenized strings.