Class CharSplitPreTokenizer
A pre-tokenizer that splits text based on a specified character delimiter.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.PreTokenizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class CharSplitPreTokenizer : IPreTokenizer
Constructors
CharSplitPreTokenizer(char, SplitDelimiterBehavior, bool)
Initializes a new instance of the CharSplitPreTokenizer class.
Declaration
public CharSplitPreTokenizer(char delimiter, SplitDelimiterBehavior behavior = SplitDelimiterBehavior.Removed, bool invert = false)
Parameters
| Type | Name | Description |
|---|---|---|
| char | delimiter | The character to use as a delimiter when splitting text. |
| SplitDelimiterBehavior | behavior | How the pre-tokenizer handles the matching substrings. |
| bool | invert | Inverts the pattern matching. |
Methods
PreTokenize(SubString, Output<SubString>)
Pre-cuts the input into smaller parts.
Declaration
public void PreTokenize(SubString input, Output<SubString> output)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | input | The source to pre-cut. |
| Output<SubString> | output | Target collection of generated pre-tokenized strings. |