All Classes and Interfaces
Class
Description
Default
ICUTokenizerConfig that is generally applicable to many languages.Extension of
CharTermAttributeImpl that encodes the term text as a binary Unicode
collation key instead of as UTF-8 bytes.Converts each token into its
CollationKey, and then encodes bytes as an
index term.Indexes collation keys as a single-valued
SortedDocValuesField.Configures
KeywordTokenizer with ICUCollationAttributeFactory.A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30
Character Foldings.
Factory for
ICUFoldingFilter.Normalize token text with ICU's
Normalizer2.Factory for
ICUNormalizer2CharFilterNormalize token text with ICU's
Normalizer2Factory for
ICUNormalizer2FilterBreaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
Factory for
ICUTokenizer.A
TokenFilter that transforms text with ICU.Factory for
ICUTransformFilter.This attribute stores the UTR #24 script value for a token of text.
Implementation of
ScriptAttribute that stores the script as an integer.