lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wojtek H" <wojte...@gmail.com>
Subject The best way to iterate over document
Date Wed, 26 Mar 2008 09:48:47 GMT
Hi all,

our problem is to choose the best (the fastest) way to iterate over huge set
of documents (basic and most important case is to iterate over all documents
in the index). Some slow process accesses documents and now it is done via
repeating query (for instance MatchAllDocsQuery). It processess first N docs
then repeats query and processes next N docs and so on. Repeating query
means in fact quadratic time! So we think about changing the way docs are
accessed.
In case of generic query the only way to speed it up we see is to keep
HitCollector in memory between requests for docs. Isn't this approach too
memory consuming?
In case of iterating over all documents I was wondering if there is a way to
determine set of index ids over which we could iterate (and of course
control index changes - if index is changed between requests we should
probably invalidate 'iterating session').
What is the best solution for this problem?
Thanks and regards,

wojtek

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message