lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Shisodia" <get.sam...@gmail.com>
Subject hit.doc, hit.score and FSDir performance
Date Tue, 11 Apr 2006 04:12:04 GMT
Hi All.

I am using Lucene as the backbone of a 'Smart Search'.

I have a layer over search that extensively analyzes results at runtime to
bucket them. I do trim the resultset, but only after this procesing since
their are non document weights that are combined with the result scores, and
the hits are then reordered/modified.

This needs to essentially get all docs (cause there's some field level
analysis), and the score for each upfront, and it seems to be taking forever
to do for a large no of hits. The documents themselves are tiny - less than
half a k usually.  Hit.doc() and .score() seem to be where its taking time -
quite as cautioned in the javadocs.

Another peculiarity : the query is basically "keywords" which hits all
fields, and you can additionally make it more precise by certain fields as
field:value. For the latter case, for a similar number of hits, the same
iteration above is much quicker than in the case where a similar number of
hits is found by the keywords hitting all fields. The query is NOT visibly
slower - but the iteration is. Something to do with how spread out across
the index the hits are ?

Is there a possible workaround for the .doc()/.score() access ? Can
RAMDirectory be used only for searches over a "regular" FSDirectory index -
and is it usable when the index size is a multiple of available RAM (this is
on RH9 or fedora core) ?

Thanks in advance,
Sameer

--
Sameer Shisodia  Bangalore
get.sameer@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message