Class WordPieceMapper
Turns an input string into a sequence of token ids using the Word Piece strategy.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.Mappers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class WordPieceMapper : IMapper
Constructors
WordPieceMapper(IReadOnlyDictionary<string, int>, SubString, string, int)
Initializes a new instance of the WordPieceMapper type.
Declaration
public WordPieceMapper(IReadOnlyDictionary<string, int> vocabulary, SubString unknownToken, string continuingSubWordPrefix = "##", int maxInputCharsPerWord = 100)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyDictionary<string, int> | vocabulary | The value->ids map for token definitions. |
| SubString | unknownToken | The value of the unknown token. |
| string | continuingSubWordPrefix | The prefix to add to inner subwords (not at the beginning of a word). |
| int | maxInputCharsPerWord | Maximum length of a tokenizable word. |
Exceptions
| Type | Condition |
|---|---|
| ArgumentOutOfRangeException |
|
| ArgumentNullException |
|
| ArgumentException |
|
Methods
IdToToken(int)
Gets the token value from the specified id.
Declaration
public string IdToToken(int id)
Parameters
| Type | Name | Description |
|---|---|---|
| int | id | The ID of the requested token. |
Returns
| Type | Description |
|---|---|
| string | The token value. |
TokenToId(string, out int)
Gets the ID of the specified token
Declaration
public bool TokenToId(string value, out int id)
Parameters
| Type | Name | Description |
|---|---|---|
| string | value | |
| int | id | The ID of the specified |
Returns
| Type | Description |
|---|---|
| bool | Whether the token exists. |
Tokenize(IReadOnlyList<SubString>, Output<Token>)
Tokenizes a list of string values.
Declaration
public void Tokenize(IReadOnlyList<SubString> inputs, Output<Token> output)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyList<SubString> | inputs | |
| Output<Token> | output | The recipient of the converted tokens. |