Class BertPreTokenizer
Splits on spaces and punctuation, removing spaces, and keeping each punctuation as separated chunk.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.PreTokenizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class BertPreTokenizer : IPreTokenizer
Methods
PreTokenize(SubString, Output<SubString>)
Pre-cuts the input into smaller parts.
Declaration
public void PreTokenize(SubString input, Output<SubString> output)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | input | The source to pre-cut. |
| Output<SubString> | output | Target collection of generated pre-tokenized strings. |