Namespace Unity.InferenceEngine.Tokenization.Normalizers
Classes
AppendNormalizer
Adds a suffix to the input string.
BertNormalizer
Normalizes raw text input for Bert model.
DefaultNormalizer
Does not apply any transformation.
LowercaseNormalizer
Returns a copy of the input converted to lowercase using the casing rules of the invariant culture.
NmtNormalizer
Provides NMT (Neural Machine Translation) normalization for text preprocessing. Filters out control characters and normalizes various whitespace and special Unicode characters to standard spaces.
PrecompiledNormalizer
Normalizer that uses a precompiled trie and a byte blob to transform input text into a normalized form, typically for tokenization.
PrependNormalizer
Adds a prefix to the input string.
RegexReplaceNormalizer
Replaces a specified pattern by another string.
ReplaceNormalizer
Replaces a specified pattern by another string.
SequenceNormalizer
Applies multiple INormalizer.
StripAccentsNormalizer
A text normalizer that removes Unicode combining mark characters from input strings. Combining marks include diacritical marks, accents, and other modifying characters that typically combine with base characters.
StripNormalizer
Normalizes text by removing leading and/or trailing whitespace characters.
UnicodeNormalizer
Applies standard Unicode normalization.
Interfaces
INormalizer
Applies transformations to the input string before pre-tokenization.