lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Some obvious questions that I'll be happy to put on the WIKI
Date Tue, 11 Jul 2006 01:29:48 GMT

Furash: welcome to Lucene.  I suspect you'll find it extremely
advantageous to pick up a copy of "Lucene In Action" it has a lot of great
examples that may help you understand some of these questions...

: token string).  But what I want to do is something like IF this is just
: a string of more than 4 characters, THEN store its literal AND soundex
: versions.  I'm thinking I need to do something to tokenstream, but I'm
: not sure what.

take a look at the position increment -- it allows analyzers to specify
that two tokens appear at the same spot.

: 2. I've got a bunch of names assocated with a single person (aliases)
: (document): e.g., "Gary Furash", "Gary 'The Nose' Furash", "Gary
: Furnham".  If I stick them all in the same field ("names"), and search
: on "Gary", that document gets overly weighted - since the name shows up
: 3 times.  So, I could just override the analyzer and only put in Gary
: once (dedupe the names), but then I loose some of the nearness stuff:
: that is, if a user types "Gary Furash", the document should hit higher -
: those words are close together.

there's a few things you can do here ... first off you need to remember
that "Gary Furash" and "Gary 'The Nose' Furash" can both match on a query
for the phrase "Gary Furash" if you turn up the slop value.  Second, you
can alter things like the tf value to reduce the impact of having "Gary"
in the field more then once -- it's even possible to override the
Similarty class (and thus the tf function) used on an individual Query (so
you can only have it affect queries on the "name" field and not the other
fields (but it's not exactly easy).

lastly, you can always put aliases in a seperate field from proper names
and search on them seperately.

it really comes down to what is important to you.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message