lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Furash Gary" <fura...@mcao.maricopa.gov>
Subject Some obvious questions that I'll be happy to put on the WIKI
Date Mon, 10 Jul 2006 14:29:15 GMT
Big fan of lucene already.  Just looking for some advice, with apologies
in advance if it's been already answerd in the list and I just didn't
search right.

1. Lets say I want to store a term in MORE than one way: e.g., I want to
store the soundex version of a word and the real version of a word.  All
of the examples of extending analyzer return one thing (I think it's a
token string).  But what I want to do is something like IF this is just
a string of more than 4 characters, THEN store its literal AND soundex
versions.  I'm thinking I need to do something to tokenstream, but I'm
not sure what.

2. I've got a bunch of names assocated with a single person (aliases)
(document): e.g., "Gary Furash", "Gary 'The Nose' Furash", "Gary
Furnham".  If I stick them all in the same field ("names"), and search
on "Gary", that document gets overly weighted - since the name shows up
3 times.  So, I could just override the analyzer and only put in Gary
once (dedupe the names), but then I loose some of the nearness stuff:
that is, if a user types "Gary Furash", the document should hit higher -
those words are close together.

Thanks all.

Gary Furash, MBA, PMP, Applications Manager
Maricopa County Attorney's Office

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message