docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    Class ByteLevelPreTokenizer

    Pre tokenize an input using ByteLevel rules.

    Inheritance
    object
    ByteLevelPreTokenizer
    Implements
    IPreTokenizer
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Unity.InferenceEngine.Tokenization.PreTokenizers
    Assembly: Unity.InferenceEngine.Tokenization.dll
    Syntax
    public class ByteLevelPreTokenizer : IPreTokenizer

    Constructors

    ByteLevelPreTokenizer(bool, bool)

    Initializes a new instance of the ByteLevelPreTokenizer type.

    Declaration
    public ByteLevelPreTokenizer(bool addPrefixSpace = true, bool gpt2Regex = true)
    Parameters
    Type Name Description
    bool addPrefixSpace

    Adds a whitespace at the beginning of the input if it doesn't start with one.

    bool gpt2Regex

    Uses the GPT2 regex to split the input into smaller SubStrings.

    Methods

    PreTokenize(SubString, Output<SubString>)

    Pre-cuts the input into smaller parts.

    Declaration
    public void PreTokenize(SubString input, Output<SubString> output)
    Parameters
    Type Name Description
    SubString input

    The source to pre-cut.

    Output<SubString> output

    Target collection of generated pre-tokenized strings.

    Implements

    IPreTokenizer
    In This Article
    Back to top
    Copyright © 2025 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)