Class MetaspaceDecoder
Decoder for "metaspace" tokenization, where spaces are represented by a special visible character (by default, U+2581 "▁").
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.Decoders
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class MetaspaceDecoder : IDecoder
Remarks
This decoder converts the metaspace replacement character back into regular spaces in the decoded string sequence.
Behaviour depends on PrependScheme:
- If Never is used, tokens are passed through unchanged.
- Otherwise, occurrences of the metaspace replacement character are replaced with a leading empty string for the first token (no space prefix) and with a regular space for all subsequent tokens.
Constructors
MetaspaceDecoder(char, PrependScheme)
Initializes a new instance of the MetaspaceDecoder class.
Declaration
public MetaspaceDecoder(char replacement = '▁', PrependScheme prependScheme = PrependScheme.Always)
Parameters
| Type | Name | Description |
|---|---|---|
| char | replacement | The character that represents a space in the metaspace-encoded tokens. Defaults to U+2581 ("▁"), which is commonly used by HuggingFace tokenizers. |
| PrependScheme | prependScheme | The scheme that controls whether and how a leading space is inserted when decoding tokens. Defaults to Always. |
Methods
Decode(IReadOnlyList<string>, Output<string>)
Applies modifications to the input detokenized strings.
Declaration
public void Decode(IReadOnlyList<string> tokens, Output<string> output)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyList<string> | tokens | The string values to modify. |
| Output<string> | output | The recipient of modified strings. |