Class WhitespaceSplitPreTokenizer
A pre-tokenizer that splits input text on whitespace characters.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.PreTokenizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class WhitespaceSplitPreTokenizer : IPreTokenizer
Remarks
This pre-tokenizer divides the input string into sub-strings by splitting at whitespace boundaries. Whitespace characters are not included in the output tokens. Consecutive whitespace characters are treated as delimiters and empty strings are not added to the output.
Methods
PreTokenize(SubString, Output<SubString>)
Pre-cuts the input into smaller parts.
Declaration
public void PreTokenize(SubString input, Output<SubString> output)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | input | The source to pre-cut. |
| Output<SubString> | output | Target collection of generated pre-tokenized strings. |