Class BpeMapper
Turns a string input into a sequence of Token instances using the Byte-Pair Encoding strategy.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.Mappers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class BpeMapper : IMapper
Constructors
BpeMapper(IReadOnlyDictionary<string, int>, IEnumerable<MergePair>, BpeMapperOptions)
Converts a substring into a sequence of Token instances using the Byte-Pair Encoding strategy.
Declaration
public BpeMapper(IReadOnlyDictionary<string, int> vocabulary, IEnumerable<MergePair> merges = null, BpeMapperOptions options = default)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyDictionary<string, int> | vocabulary | The map associating token string representation with their ids. |
| IEnumerable<MergePair> | merges | The list of mergeable token pairs, ordered by priority. |
| BpeMapperOptions | options | See BpeMapperOptions |
Methods
DeTokenize(IReadOnlyList<int>, bool, Output<string>)
Declaration
public void DeTokenize(IReadOnlyList<int> ids, bool _, Output<string> output)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyList<int> | ids | |
| bool | _ | |
| Output<string> | output |
IdToToken(int)
Gets the token value from the specified id.
Declaration
public string IdToToken(int id)
Parameters
| Type | Name | Description |
|---|---|---|
| int | id | The ID of the requested token. |
Returns
| Type | Description |
|---|---|
| string | The token value. |
TokenToId(string, out int)
Gets the ID of the specified token
Declaration
public bool TokenToId(string token, out int id)
Parameters
| Type | Name | Description |
|---|---|---|
| string | token | The token we want to get the ID of. |
| int | id | The ID of the specified |
Returns
| Type | Description |
|---|---|
| bool | Whether the token exists. |
Tokenize(IReadOnlyList<SubString>, Output<Token>)
Tokenizes a list of string values.
Declaration
public void Tokenize(IReadOnlyList<SubString> inputs, Output<Token> output)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyList<SubString> | inputs | |
| Output<Token> | output | The recipient of the converted tokens. |