Class BertNormalizer
Normalizes raw text input for Bert model.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.Normalizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class BertNormalizer : INormalizer
Constructors
BertNormalizer(bool, bool, bool?, bool)
Initializes a new instance of the type BertNormalizer
Declaration
public BertNormalizer(bool cleanText = true, bool handleCjkChars = true, bool? stripAccents = null, bool lowerCase = true)
Parameters
| Type | Name | Description |
|---|---|---|
| bool | cleanText | If true, removes control characters and replaces whitespaces by the classic one. |
| bool | handleCjkChars | If true, puts spaces around each chinese character. |
| bool? | stripAccents | If true, strips all accents.
If set to null, it takes the value of |
| bool | lowerCase | If true, converts the input to lowercase. |
Methods
Normalize(SubString)
Applies transformations to the input string before pre-tokenization.
Declaration
public SubString Normalize(SubString input)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | input | The string to transform. |
Returns
| Type | Description |
|---|---|
| SubString | The resulting string. |