Interface ITokenizer
The high level API of a tokenization/detokenization pipeline.
Namespace: Unity.InferenceEngine.Tokenization
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public interface ITokenizer
Methods
Decode(IReadOnlyList<int>, bool)
Turns a sequence of token ids into a string.
Declaration
string Decode(IReadOnlyList<int> input, bool skipSpecialTokens = false)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyList<int> | input | The sequence of token ids. |
| bool | skipSpecialTokens | Do not decode the special tokens. |
Returns
| Type | Description |
|---|---|
| string | The decoded string. |
Encode(string, string, bool)
Turns inputA, optionally inputB into an
IEncoding instance.
Declaration
IEncoding Encode(string inputA, string inputB = null, bool addSpecialTokens = true)
Parameters
| Type | Name | Description |
|---|---|---|
| string | inputA | The main input to tokenize. Cannot be null. |
| string | inputB | A optional, secondary input to tokenize. |
| bool | addSpecialTokens | Tells whether special tokens must be added to the final IEncoding. |
Returns
| Type | Description |
|---|---|
| IEncoding | The tokenized value as an IEncoding instance. |