Class PunctuationPreTokenizer
A pre-tokenizer that splits text on punctuation characters.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.PreTokenizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class PunctuationPreTokenizer : IPreTokenizer
Remarks
This pre-tokenizer identifies punctuation characters (both ASCII and Unicode) and splits the input text accordingly. The behavior of how delimiters are handled (isolated, removed, merged, or contiguous) is determined by the specified SplitDelimiterBehavior.
Constructors
PunctuationPreTokenizer(SplitDelimiterBehavior)
Initializes a new instance of the PunctuationPreTokenizer class with the specified delimiter behavior.
Declaration
public PunctuationPreTokenizer(SplitDelimiterBehavior behavior = SplitDelimiterBehavior.Isolated)
Parameters
| Type | Name | Description |
|---|---|---|
| SplitDelimiterBehavior | behavior | The behavior that determines how punctuation delimiters are handled during splitting. Default is Isolated. |
Methods
PreTokenize(SubString, Output<SubString>)
Pre-cuts the input into smaller parts.
Declaration
public void PreTokenize(SubString input, Output<SubString> output)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | input | The source to pre-cut. |
| Output<SubString> | output | Target collection of generated pre-tokenized strings. |