lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikhil G. Daddikar" <>
Subject Using lucene more effectively
Date Wed, 24 Oct 2001 03:56:52 GMT

I have implemented search on my website using lucene. However, I still have
a few questions -- mostly because of the flexible nature of Lucene. I would
like to learn from users who have some real world experience.

My situation: English documents. Three searchable 'fields' per document --
reference id, short description and content.

* Analyzer
I am using a combination of Standard, LowerCase, Stop and PorterStem
tokenizers. Is this the preferred combo or is there anything better?

* Query
Right now I am directly using the query parser. However, I am uncertain as
to whether this is the best approach especially with the myraid of Query
classes (Fuzzy, Wildcard, etc.). I would like to set up 'internal' boost
factors i.e maybe description is twice as important as content, etc. but
don't want users to enter the boost factors in the query itself. Any
experience shared will be greatly appreciated.

BTW, what is a Fuzzy Query?

* Ranking
I've read the FAQ on generating the "stars" but am still a bit confused. For
example, searching a 2 page document that has about 7 or 8 'email' in it the
score is 0.07. Now I would've thought that this is a 4 star at least (if not
a 5) kind of search. In fact, I rarely get a 0.8+ score. I am aware that teh
score depends on the total number of words as well and that makes it even
more confusing on how to design a 'starring' strategy.

* General
* Has anybody implemented aliasing yet? If yes, can you please point me to

* My search is going to be used by businesses and I read some artciles that
said that organizing search results by "topics" is preferred very greatly.
E.g. if the word 'penguin' is searched then it would be a bit useless to
show sites related to arctic life, with linux, with research site on
penguins, with the hockey team site, etc. on the same page. Better would be
to organize these results. This of course is a very stupid example that can
be sorted out by adding more keywords, but in business scenarios this may
not be easy. Has anybody ventured into this? Any pointers will be useful.

Thanks much in advance

View raw message