lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
Date Sat, 08 Mar 2008 17:57:05 GMT
Random text can often be pretty slow when done per word.

I think you will have to modify the MultiSearcher a bit. The 
MultiSearcher takes a global id space and converts to and from an 
individual Searcher id space. The MultiSearcher's id space is limited to 
an int as well, but I think if you change it to a float/double, you 
should be all set.

- Mark

Toke Eskildsen wrote:
> On Fri, 2008-03-07 at 00:03 +0100, Ray wrote:
>    
>> I am currently running a small random text indexer with 400 docs/second.
>> It will reach 2 billion in around 45 days.
>>      
>
> If you are just doing it to test large indexes (in terms of document
> count), then you need to look into your index-generation code. I tried
> making an ultra-simple index builder, where each document contains a
> unique id and one of nine fixed strings. The index-building speed on my
> desktop computer is 40.000 documents/second (tested with 100 million
> documents).
>
> I would suspect that your random text generator is where all the
> time-intensive processing occurs. Either that or you're flushing after
> each document addition (which lowers my execution speed to about 100
> documents/second).
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message