lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Staveley (Tom)" <>
Subject Re: Sorting
Date Mon, 31 Jul 2006 06:12:37 GMT
The limit is much less than Integer.MAX_VALUE (2,147,483,647), unless
you have a VM which can run in more than 1G of heap. 1G limits you to a
theoretical number of 256M (268,435,456) documents with 4 bytes per
array element. In practise it will be something a less, because there
are other things which need heap too.

Were going to need to maintain a set sort indexes for documents in a
large index too, and I'm interested in suggestions for the best/easiest
way to maintain non-RAM-based (or not entirely RAM-based) sort index
which is external to Lucene. Would using MySQL for sort indexing be "a
sledgehammer to crack a nut", I wonder? I've not really thought through
the RAMifications (sorry!) of this approach. I wonder if anyone else
here has attempted to integrate an external sort using a database?

On Sat, 2006-07-29 at 22:42 +0200, karl wettin wrote:
> On Sat, 2006-07-29 at 12:39 -0700, Jason Calabrese wrote:
> > One fast way to make an alphabetic sort very fast is to presort your
> > docs before adding them to the index.  If you do this you can then
> > just sort by index order.  We are using this for a large index (1
> > million+ docs) and it works very good, and seems even slightly faster
> > than relevance sorting.
> > 
> > Using this approach may create some maintainance issues since you
> > can't add a new doc to the index at a specified position.  Instead you
> > will need to re-index everything. 
> Instead of above I would probably choose an int[index size] where each
> position in the array represents the global order of that document. It's
> much easier to re-order that than re-indexing the whole corpus every
> time you want to insert something.
> It limits your corpus to 2 billion items (Integer.MAX_VALUE). And it
> will consume 32 bits of RAM per document.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message