docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    Struct BpeMapperOptions

    Configuration settings for the Byte Pair Encoding (BPE) mapper used in tokenization.

    Inherited Members
    ValueType.Equals(object)
    ValueType.GetHashCode()
    ValueType.ToString()
    object.Equals(object, object)
    object.GetType()
    object.ReferenceEquals(object, object)
    Namespace: Unity.InferenceEngine.Tokenization.Mappers
    Assembly: Unity.InferenceEngine.Tokenization.dll
    Syntax
    public struct BpeMapperOptions

    Fields

    ByteFallback

    Gets or sets a value indicating whether to fall back to byte-level encoding when encountering characters that cannot be tokenized normally.

    Declaration
    public bool? ByteFallback
    Field Value
    Type Description
    bool?

    true to enable byte-level fallback; false to disable; null to use default behavior.

    DropOut

    Gets or sets the dropout rate applied during BPE merge operations. When specified, randomly skips merges during training to improve robustness.

    Declaration
    public float? DropOut
    Field Value
    Type Description
    float?

    A float value between 0.0 and 1.0 representing the dropout probability, or null to disable dropout.

    FuseUnknown

    Gets or sets a value indicating whether to fuse consecutive unknown tokens into a single unknown token.

    Declaration
    public bool? FuseUnknown
    Field Value
    Type Description
    bool?

    true to fuse unknown tokens; false to keep them separate; null to use default behavior.

    IgnoreMerges

    Whether or not to direct output words if they are part of the vocab. Not yet implemented.

    Declaration
    public bool? IgnoreMerges
    Field Value
    Type Description
    bool?

    SubWordPrefix

    Gets or sets the prefix string added to subword tokens to distinguish them from complete words.

    Declaration
    public string SubWordPrefix
    Field Value
    Type Description
    string

    A string prefix (commonly "##" or "@@") added to subword tokens, or null if no prefix is used.

    UnknownToken

    Gets or sets the token string used to represent unknown or out-of-vocabulary words.

    Declaration
    public string UnknownToken
    Field Value
    Type Description
    string

    A string representing the unknown token (commonly "<unk>" or "[UNK]"), or null if no unknown token is specified.

    WordSuffix

    Gets or sets the suffix string added to word tokens to mark word boundaries.

    Declaration
    public string WordSuffix
    Field Value
    Type Description
    string

    A string suffix (commonly "@@" or specific boundary markers) added to word tokens, or null if no suffix is used.

    In This Article
    Back to top
    Copyright © 2025 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)