lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edwin Crockford <ecrockf...@invicro.com>
Subject Re: [lucy-user] Large index sizes
Date Thu, 25 Apr 2013 14:47:26 GMT
Hi Bob,

Many thanks for the quick reply, it looks like we will have to beef up 
the machine a bit. Currently the largest index we have successfully 
built is 2G so still along ways below your figures. I notice there is a 
feature to search multiple indexes simultaneously 
(Lucy::Search::PolySearcher). Is this a possible way around our resource 
issue, split the index into small ones and then do a polysearch across 
them all, or is there a noticeable performance hit?

Regards
Edwin

On 25/04/2013 13:16, Bob Bruen wrote:
>
> Hi,
>
> I have indexed millions of files, ending up with a 127G index file, 
> which works fine. There are enough resources for this.
>
> I also tried to do the same with 10s of millions, but the indexing 
> process never could finish, even with enough resources (index file 
> ~400G). It kept updating one file a tiny bit every few minutes. I 
> think I could do a better job in the code, but I have not been able to 
> get back to it yet.
>
>             -bob
>
>
> On Thu, 25 Apr 2013, Edwin Crockford wrote:
>
>> Have recently built started to use Lucy (with Perl) and everything 
>> went well until I tried to index a large file store (>300,000 files). 
>> The indexer process reached >8Bbytes and the machine ran out of 
>> resources. My questions are:
>>
>> a) Is this the normal resources requirements?
>>
>> b) Is there a way to avoid swamping machines?
>>
>> I also found that the searcher becomes very large for large indexes 
>> and as ours runs as a part of a FastCGI process it exceeded the 
>> ulimit of the process. Upping the ulimit fixed this, but diagnosing 
>> the issue was difficult as the query would just return 0 results 
>> rather than indicating that it had run out of procees space.
>>
>> Many thanks
>>
>> Edwin Crockford
>>
>


Mime
View raw message