Struct TokenConfiguration
Represents a token that can be added to a Tokenizer instance, with optional properties that control its behavior.
Implements
Inherited Members
Namespace: Unity.InferenceEngine.Tokenization
Assembly: Unity.InferenceEngine.Tokenization.dll
Syntax
public readonly struct TokenConfiguration : IEquatable<TokenConfiguration>
Constructors
TokenConfiguration(int, string, bool, Direction, bool, bool)
Initializes a new instance of the TokenConfiguration type.
Declaration
public TokenConfiguration(int id, string value, bool wholeWord, Direction strip, bool normalized, bool special)
Parameters
| Type | Name | Description |
|---|---|---|
| int | id | The ID of the token. See Id |
| string | value | The value of the token. See Value. |
| bool | wholeWord | Specifies whether the token should only match whole words. See WholeWord. |
| Direction | strip | Defines whether this token should strip all potential whitespaces on its left side, right side, or both. See Strip. |
| bool | normalized | Defines whether this token should match against the normalized version of the input text. See Normalized. |
| bool | special | Defines whether this token should be skipped when decoding. See Special. |
Fields
Id
The ID of the token.
Declaration
public readonly int Id
Field Value
| Type | Description |
|---|---|
| int |
Normalized
Defines whether this token should match against the normalized version of the input text. For example, with the added token "yesterday", and a normalizer in charge of lowercasing the text, the token could be extract from the input "I saw a lion Yesterday".
Declaration
public readonly bool Normalized
Field Value
| Type | Description |
|---|---|
| bool |
Special
Defines whether this token should be skipped when decoding.
Declaration
public readonly bool Special
Field Value
| Type | Description |
|---|---|
| bool |
Strip
Defines whether this token should strip all potential whitespaces on its left side,
right side, or both.
For example if we try to match the token [MASK] with Strip =
Left, in the text "I saw a [MASK]", we would match on " [MASK]".
(Note the space on the left).
Declaration
public readonly Direction Strip
Field Value
| Type | Description |
|---|---|
| Direction |
Value
The value of the token.
Declaration
public readonly string Value
Field Value
| Type | Description |
|---|---|
| string |
WholeWord
Specifies whether the token should only match whole words.
If set to true, the token will not match within a word.
For example, the token ing would match tokenizing when this option is false,
but not when it is true.
Word boundaries are determined using regular expression rules, meaning the token
must begin and end at word boundaries.
Declaration
public readonly bool WholeWord
Field Value
| Type | Description |
|---|---|
| bool |
Methods
Equals(object)
Declaration
public override bool Equals(object obj)
Parameters
| Type | Name | Description |
|---|---|---|
| object | obj |
Returns
| Type | Description |
|---|---|
| bool |
Overrides
Equals(TokenConfiguration)
Declaration
public bool Equals(TokenConfiguration other)
Parameters
| Type | Name | Description |
|---|---|---|
| TokenConfiguration | other |
Returns
| Type | Description |
|---|---|
| bool |
GetHashCode()
Declaration
public override int GetHashCode()
Returns
| Type | Description |
|---|---|
| int |