lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Rodenburg <jeff.rodenb...@gmail.com>
Subject Suggestions for analysis
Date Thu, 22 Sep 2005 04:14:57 GMT
I'm looking for some suggestions on an analyzer decision. I've got my own
thoughts to this already, but would like some initial feedback on it first.

The scenario:

   - An index of geographic information: cities, towns, states,
   neighborhoods, zipcodes, generic names, etc. Examples are "New York, NY",
   "New York", "Midtown", "10012", "The Big Apple".
   - I have these mapped to underlying geographic data points: census
   data, postal data, mapping data, etc.
   - I want some of these to carry more precedence than others when
   conflicting/matching terms exist, i.e. "Washington" should score
   Washington D.C. higher than the state of Washington. This would be
   decided on an item-by-item basis, and not dictated by one broad field.
   - I need the right mix for searches to work as I expect. As an
   example, a search for "Wedgewood WA" would ideally not match "Wedgewood GA".

I'm starting with the StandardAnalyzer and thinking of possibly extending it
to carry in some of the business rules meant to come into play for
tie-breakers.

Comments appreciated.

Thanks,
jeff r.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message