docs.unity3d.com
Search Results for

    Show / Hide Table of Contents

    Class StripAccentsNormalizer

    A text normalizer that removes Unicode combining mark characters from input strings. Combining marks include diacritical marks, accents, and other modifying characters that typically combine with base characters.

    Inheritance
    object
    StripAccentsNormalizer
    Implements
    INormalizer
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Unity.InferenceEngine.Tokenization.Normalizers
    Assembly: Unity.InferenceEngine.Tokenization.dll
    Syntax
    public class StripAccentsNormalizer : INormalizer
    Remarks

    This normalizer is useful in tokenization pipelines where diacritical marks and accents need to be removed for text processing, such as standardizing text for comparison, simplifying text for machine learning models, or converting accented characters to their base forms.

    Methods

    Normalize(SubString)

    Applies transformations to the input string before pre-tokenization.

    Declaration
    public SubString Normalize(SubString input)
    Parameters
    Type Name Description
    SubString input

    The string to transform.

    Returns
    Type Description
    SubString

    The resulting string.

    Implements

    INormalizer
    In This Article
    Back to top
    Copyright © 2026 Unity Technologies — Trademarks and terms of use
    • Legal
    • Privacy Policy
    • Cookie Policy
    • Do Not Sell or Share My Personal Information
    • Your Privacy Choices (Cookie Settings)