lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tcorbet" <>
Subject StopWords -- File Suffix
Date Tue, 18 Oct 2005 06:32:28 GMT
I thought that by using a StandardAnalyzer
with a StopWord list that is a merge of the
ENGLISH_STOP_WORDS and a handful
of additions that I have provided -- additions
which include the most common file
suffixes [.txt, .xml, .doc, etc.] -- ought
to eliminate any occurrence of those
terms in the resulting indexes.  However,
when I dump the index I see that the last
element of the file name concatenated with
a 'dot' and the suffix is what is being indexed.
So, I guess I did succeed in avoiding the
waste of indexing the suffix, but I am losing
the index of the final element of a file name
that includes embedded white space.

Please advise how to force the parser to
recognize and ignore the 'dot'.

Thank you.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message