docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    Namespace Unity.InferenceEngine.Tokenization.PreTokenizers

    Classes

    BertPreTokenizer

    Splits on spaces and punctuation, removing spaces, and keeping each punctuation as separated chunk.

    ByteLevelPreTokenizer

    Pre tokenize an input using ByteLevel rules.

    CharSplitPreTokenizer

    A pre-tokenizer that splits text based on a specified character delimiter.

    DefaultPreTokenizer

    Default placeholder implementation of a pre-tokenizer. Does not pre-cut the input.

    DigitsPreTokenizer

    A pre-tokenizer that splits input text at digit boundaries. This class separates numeric digits from non-numeric characters during the pre-tokenization phase.

    MetaspacePreTokenizer

    A pre-tokenizer that replaces spaces with a special character (metaspace) and optionally splits the input text at these metaspace boundaries. This is commonly used in SentencePiece-based tokenizers.

    PunctuationPreTokenizer

    A pre-tokenizer that splits text on punctuation characters.

    RegexSplitPreTokenizer

    Splits the input based on a regular expression.

    RuneSplitPreTokenizer

    Splits the input by the runes.

    SequencePreTokenizer

    Applies a sequence of pre tokenizers.

    StringSplitPreTokenizer

    Splits the input based on a string pattern.

    WhitespacePreTokenizer

    A pre-tokenizer that splits text into word tokens and non-word, non-whitespace tokens. This implementation matches the behavior of the regular expression pattern "\w+|[^\w\s]+".

    WhitespaceSplitPreTokenizer

    A pre-tokenizer that splits input text on whitespace characters.

    Interfaces

    IPreTokenizer

    Pre-cuts the input string into smaller parts. Those parts will be passed to the IMapper for tokenization.

    In This Article
    Back to top
    Copyright © 2026 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)