Class ByteLevelPreTokenizer
Pre tokenize an input using ByteLevel rules.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.PreTokenizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class ByteLevelPreTokenizer : IPreTokenizer
Constructors
ByteLevelPreTokenizer(bool, bool)
Initializes a new instance of the ByteLevelPreTokenizer type.
Declaration
public ByteLevelPreTokenizer(bool addPrefixSpace = true, bool gpt2Regex = true)
Parameters
| Type | Name | Description |
|---|---|---|
| bool | addPrefixSpace | Adds a whitespace at the beginning of the input if it doesn't start with one. |
| bool | gpt2Regex | Uses the GPT2 regex to split the input into smaller SubStrings. |
Methods
PreTokenize(SubString, Output<SubString>)
Pre-cuts the input into smaller parts.
Declaration
public void PreTokenize(SubString input, Output<SubString> output)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | input | The source to pre-cut. |
| Output<SubString> | output | Target collection of generated pre-tokenized strings. |