Class PrecompiledNormalizer
Normalizer that uses a precompiled trie and a byte blob to transform input text into a normalized form, typically for tokenization.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization.Normalizers
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public class PrecompiledNormalizer : INormalizer
Constructors
PrecompiledNormalizer(IReadOnlyList<ulong>, ReadOnlySpan<byte>)
Initializes a new instance of the PrecompiledNormalizer class.
Declaration
public PrecompiledNormalizer(IReadOnlyList<ulong> trieBlob, ReadOnlySpan<byte> normalizedBytes)
Parameters
| Type | Name | Description |
|---|---|---|
| IReadOnlyList<ulong> | trieBlob | Serialized representation of the double-array trie defining the normalization mappings. |
| ReadOnlySpan<byte> | normalizedBytes | Read-only span over a byte blob that contains UTF-8 encoded, null-terminated normalized strings referenced by the trie. |
Methods
Normalize(SubString)
Applies transformations to the input string before pre-tokenization.
Declaration
public SubString Normalize(SubString original)
Parameters
| Type | Name | Description |
|---|---|---|
| SubString | original |
Returns
| Type | Description |
|---|---|
| SubString | The resulting string. |