Class RegexSplitPreTokenizer
Splits the input based on a regular expression.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.PreTokenizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class RegexSplitPreTokenizer : IPreTokenizer
Constructors
RegexSplitPreTokenizer(string, SplitDelimiterBehavior, bool)
Initializes a new instance of the RegexSplitPreTokenizer type.
Declaration
public RegexSplitPreTokenizer(string pattern, SplitDelimiterBehavior behavior, bool invert = false)
Parameters
| Type | Name | Description |
|---|---|---|
| string | pattern | The regular expression pattern to use for splitting the input. |
| SplitDelimiterBehavior | behavior | Indicates how to handle splits and patterns. SplitDelimiterBehavior |
| bool | invert | Whether of not to invert the pattern. Not yet implemented. |
Exceptions
| Type | Condition |
|---|---|
| ArgumentNullException | Thrown when |
| ArgumentOutOfRangeException | Thrown when |
Methods
PreTokenize(SubString, Output<SubString>)
Pre-cuts the input into smaller parts.
Declaration
public void PreTokenize(SubString input, Output<SubString> output)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | input | The source to pre-cut. |
| Output<SubString> | output | Target collection of generated pre-tokenized strings. |