docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    Class WordPieceMapper

    Turns an input string into a sequence of token ids using the Word Piece strategy.

    Inheritance
    object
    WordPieceMapper
    Implements
    IMapper
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Unity.InferenceEngine.Tokenization.Mappers
    Assembly: Unity.InferenceEngine.Tokenization.dll
    Syntax
    public class WordPieceMapper : IMapper

    Constructors

    WordPieceMapper(IReadOnlyDictionary<string, int>, SubString, string, int)

    Initializes a new instance of the WordPieceMapper type.

    Declaration
    public WordPieceMapper(IReadOnlyDictionary<string, int> vocabulary, SubString unknownToken, string continuingSubWordPrefix = "##", int maxInputCharsPerWord = 100)
    Parameters
    Type Name Description
    IReadOnlyDictionary<string, int> vocabulary

    The value->ids map for token definitions.

    SubString unknownToken

    The value of the unknown token.

    string continuingSubWordPrefix

    The prefix to add to inner subwords (not at the beginning of a word).

    int maxInputCharsPerWord

    Maximum length of a tokenizable word.

    Exceptions
    Type Condition
    ArgumentOutOfRangeException

    maxInputCharsPerWord is negative or 0.

    ArgumentNullException

    vocabulary cannot be null.

    ArgumentException

    unknownToken not found in the vocabulary.

    Methods

    IdToToken(int)

    Gets the token value from the specified id.

    Declaration
    public string IdToToken(int id)
    Parameters
    Type Name Description
    int id

    The ID of the requested token.

    Returns
    Type Description
    string

    The token value.

    TokenToId(string, out int)

    Gets the ID of the specified token

    Declaration
    public bool TokenToId(string value, out int id)
    Parameters
    Type Name Description
    string value
    int id

    The ID of the specified token.

    Returns
    Type Description
    bool

    Whether the token exists.

    Tokenize(IReadOnlyList<SubString>, Output<Token>)

    Tokenizes a list of string values.

    Declaration
    public void Tokenize(IReadOnlyList<SubString> inputs, Output<Token> output)
    Parameters
    Type Name Description
    IReadOnlyList<SubString> inputs
    Output<Token> output

    The recipient of the converted tokens.

    Implements

    IMapper
    In This Article
    Back to top
    Copyright © 2025 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)