lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Barbarelli" <mbarbare...@gmail.com>
Subject Customizing Stop Word List?
Date Thu, 12 Jul 2007 17:23:45 GMT
Hello to All,

I'm having a problem with Lucene where certain words that I would like to be
included in the query are actually being ommitted from it.  And I think that
is because Lucene recognizes them as stop words.  This is the case with
roughly four terms in particular.  They look like English grammar particles,
but they are actually ISO country codes.

"IT"     (Italy)
"IN"     (India)
"BE"   (Belgium)
"NO"   (Norway)
"AT"    (Austria)

Scenario:
-------------
The user submits ISO country codes as part of the Lucene query to be matched
against a field in the Lucene index that also contains ISO country codes.
In most cases, this works fine due to the fact that the majority of ISO
country codes do not resemble grammar particles.  The following are okay,
for example.

GB  (Great Britain)
FR  (France)
NL  (Netherlands)

The following are stripped from queries, as listed above.

 "IT"     (Italy)
"IN"     (India)
"BE"   (Belgium)
"NO"   (Norway)
"AT"    (Austria)


So far, I have attempted to fix this problem by defining my own list of stop
words and passing that array onto a standard analyzer used for both indexing
and searching.  That didn't work.  Would a per-field analyzer work in this
case?

Any ideas?  Many thanks in advance for your help.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message