lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Sort suggestion
Date Tue, 29 Jul 2008 19:17:39 GMT
I think you'll find it slow to add disk seeks in the sort on each 
search. Something you might be able to work from though (though I doubt 
it still applys cleanly) is Hoss' issue 
https://issues.apache.org/jira/browse/LUCENE-831. This allows for a 
pluggable cache implementation for sorting. Also allows for much faster 
reopening in most cases - hasn't seen any activity, and I think they are 
looking to get the reopen gains elsewhere, but it may be worth playing with.

- Mark

Marcus Herou wrote:
> Guys.
>
> I've noticed many having trouble with sorting and OOM. Eventually they solve
> it by throwing more memory at the problem.
>
> Should'nt a solution which can sort on disk when neccessary be implemented
> in core Lucene ?
> Something like this:
> http://www.codeodor.com/index.cfm/2007/5/10/Sorting-really-BIG-files/1194
>
> Since you obviously know the result size you can calculate how much memory
> is needed for the sort and if the calculated value s higher then a
> configurable threshold an external on disk sort is performed and perhaps a
> logging message which states something on a WARN level.
>
> Just a thought since I'm about to implement something which could sort any
> Comparable object but on disk.
>
> Guess the Hadoop project have the perfect tools for this since everything
> the mapred inputfiles are sorted, on disk and huge.
>
> Kindly
>
> //Marcus
>
>
>   


Mime
View raw message