Package org.apache.lucene.analysis.email
Class UAX29URLEmailAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.email.UAX29URLEmailAnalyzer
- All Implemented Interfaces:
Closeable,AutoCloseable
Filters
UAX29URLEmailTokenizer with LowerCaseFilter
and StopFilter, using a list of English stop words.- Since:
- 3.6.0
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intDefault maximum allowed token lengthstatic final CharArraySetAn unmodifiable set containing some common English words that are usually not useful for searching.Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwordsFields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY -
Constructor Summary
ConstructorsConstructorDescriptionBuilds an analyzer with the default stop words (STOP_WORDS_SET).UAX29URLEmailAnalyzer(Reader stopwords) Builds an analyzer with the stop words from the given reader.UAX29URLEmailAnalyzer(CharArraySet stopWords) Builds an analyzer with the given stop words. -
Method Summary
Modifier and TypeMethodDescriptionprotected Analyzer.TokenStreamComponentscreateComponents(String fieldName) intprotected TokenStreamnormalize(String fieldName, TokenStream in) voidsetMaxTokenLength(int length) Set the max allowed token length.Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSetMethods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
Field Details
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTHDefault maximum allowed token length- See Also:
-
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
-
-
Constructor Details
-
UAX29URLEmailAnalyzer
Builds an analyzer with the given stop words.- Parameters:
stopWords- stop words
-
UAX29URLEmailAnalyzer
public UAX29URLEmailAnalyzer()Builds an analyzer with the default stop words (STOP_WORDS_SET). -
UAX29URLEmailAnalyzer
Builds an analyzer with the stop words from the given reader.- Parameters:
stopwords- Reader to read stop words from- Throws:
IOException- See Also:
-
-
Method Details
-
setMaxTokenLength
public void setMaxTokenLength(int length) Set the max allowed token length. Tokens larger than this will be chopped up at this token length and emitted as multiple tokens. If you need to skip such large tokens, you could increase this max length, and then useLengthFilterto remove long tokens. The default isDEFAULT_MAX_TOKEN_LENGTH. -
getMaxTokenLength
public int getMaxTokenLength()- See Also:
-
createComponents
- Specified by:
createComponentsin classAnalyzer
-
normalize
-