Namespace Unity.InferenceEngine.Tokenization.PreTokenizers
Classes
BertPreTokenizer
Splits on spaces and punctuation, removing spaces, and keeping each punctuation as separated chunk.
ByteLevelPreTokenizer
Pre tokenize an input using ByteLevel rules.
DefaultPreTokenizer
Default placeholder implementation of a pre-tokenizer. Does not pre-cut the input.
RegexSplitPreTokenizer
Splits the input based on a regular expression.
SequencePreTokenizer
Applies a sequence of pre tokenizers.
Interfaces
IPreTokenizer
Pre-cuts the input string into smaller parts. Those parts will be passed to the IMapper for tokenization.