Namespace Unity.InferenceEngine.Tokenization.Normalizers

Classes

BertNormalizer

Normalizes raw text input for Bert model.

LowercaseNormalizer

Returns a copy of the input converted to lowercase using the casing rules of the invariant culture.

NmtNormalizer

Provides NMT (Neural Machine Translation) normalization for text preprocessing. Filters out control characters and normalizes various whitespace and special Unicode characters to standard spaces.

PrecompiledNormalizer

Normalizer that uses a precompiled trie and a byte blob to transform input text into a normalized form, typically for tokenization.

RegexReplaceNormalizer

Replaces a specified pattern by another string.

ReplaceNormalizer

Replaces a specified pattern by another string.

A text normalizer that removes Unicode combining mark characters from input strings. Combining marks include diacritical marks, accents, and other modifying characters that typically combine with base characters.

StripNormalizer

Normalizes text by removing leading and/or trailing whitespace characters.

UnicodeNormalizer

Applies standard Unicode normalization.

Interfaces

INormalizer

Applies transformations to the input string before pre-tokenization.