docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    Namespace Unity.InferenceEngine.Tokenization

    Classes

    Encoding

    Contains the result of a tokenization pipeline ran by a Tokenizer instance.

    OutputUtility

    Utility methods for Output<T>

    Tokenizer

    This type is the entry point of the tokenization/detokenization pipeline. The pipeline is composed of six steps, and turns an input string into an IEncoding chain:

    1. Pre-tokenization Splits the result of the normalization step into small pieces (example: split by whitespace).
    2. Encoding Central step of the tokenization, this one turns each piece from the pre-tokenization process into sequence of int ids. See IMapper for more details.
    3. Truncation Splits the sequence of ids from the encoding step into smaller subsequences. The most frequent truncation rule in "max length". See ITruncator for more details.
    4. Postprocessing Transforms each subsequences of generated from the truncation. The most common transformation is adding [CLS] and [SEP] tokens before and after the sequence. See IPostProcessor for more details.
    5. Padding Pads each subsequence from the postprocessing to match the expected sequence size.

    Structs

    Output<T>

    Target interface for tokenization components.

    SubString

    Represents a portion of a string value.

    Token

    Represents the data of a token in a sequence.

    TokenConfiguration

    Represents a token that can be added to a Tokenizer instance, with optional properties that control its behavior.

    Interfaces

    IEncoding

    Describes the result of a tokenization pipeline execution.

    ITokenizer

    The high level API of a tokenization/detokenization pipeline.

    Enums

    Direction

    Tells whether performing a process to the Left, to the Right, or both.

    SequenceIdentifier

    Identifies a sequence. It is used in the TemplatePostProcessor.

    SplitDelimiterBehavior

    Options for how to deal with the delimiter when splitting the input string. See RegexSplitPreTokenizer

    In This Article
    Back to top
    Copyright © 2025 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)