lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: Sorting
Date Wed, 02 Aug 2006 03:31:55 GMT
: I'm with you now. So you do seeks in your comparator. For a large index you
: might as well use java.io.RandomAccessFile for the "array", because there
: would be little value in buffering when the comparator is liable to jump all

yep .. that's what i was getting at ... but i'm not so sure that buffering
won't be usefull.  I've i'm not mistaken, all Scorers are by contract
expected to score docs in docId order so when your hits are being
collected for sorting, you should allways be moving forward in the file
-- but you may skip ahead alot when the result set isn't a high percentage
of the total number of docs.
(i may be wrong about all Scorers going in docId order ... if you
explicilty use the 1.4 BooleanScorer you may not get that behavior, but i
think everything else works that way ... perhaps someone else can verify
that)

: around the file. This sounds very expensive, though. If you don't open a
: Searcher to frequently, it makes sense (in my muddled mind) to pre-sort to
: reduce the number of seeks. That was the half-baked idea of the third file,
: which essentially orders document IDs.

presort on what exactly, the field you want to sort on?  -- That's
esentially what the TermEnum is.  I'm not sure how having that helps you
... let's assume you've got some data structure (let's not worry about the
file/ram or TermEnum distinction just yet) containing every document in
your index of 100,000,000 products sorted on the price field, and you've
done a search for "apple" and there are 1,000,000 docIds for matching
products ready to be collected by your new custom Scoring code ... how
does the full list of all docIds sorted by price help you as you are given
docIds and have to decide if that doc is better or worse then the docs
you've already collected?

: > Bear in mind, there have been some improvements recently to the ability to
: grab individual stored fields per document....
:
: I can't see anything like that in 2.0. Is that something in the Lucene HEAD
: build?

I guess so ... search the java-dev archives for "lazy field loading" or
"Fieldable" .. that should find some of the discussion about it and the
jira issue with the changes.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message