lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <emailg...@yahoo.co.in>
Subject Re: Single searcher vs Multi Searcher
Date Tue, 07 Oct 2008 09:59:33 GMT
Hello Anusham,

My intention is to shard the index after every 7 days (week). After 30 days, 
(4th  week) the first DB may get deleted. At any point of time i will be 
maitaining 3 to 4 DB.

I want to know the pros and cons of the MultiSearcher or Index sharding 
approach. Any web links would be helpful.

Regards
Ganesh

----- Original Message ----- 
From: "Anshum" <anshumg@gmail.com>
To: <java-user@lucene.apache.org>
Sent: Tuesday, October 07, 2008 12:18 AM
Subject: Re: Single searcher vs Multi Searcher


> Hi Ganesh,
> About the memory consumption while sorting, it would end up using similar
> amounts, perhaps even more.. like in the case of regular parallel
> programming algorithms (hoping that you intend to search using a parallel
> multi searcher). Would you have to query particular indexes only for a
> particular search or would you be searching over all the indexes and then
> follow it up by merger (which the parallel multi searcher would do
> efficiently).?
> Also, I guess 30 indexes would be a little too many, haven't really tried
> out those many indexes for a multisearcher.
> As far as maintenance of DB is concerned, it might be easy as long as you
> don't have any document updates, in which case you'd have to shift the
> documents from one DB/index to another (which includes creating an entry 
> in
> the latest index/DB and deleting the record from the older DB).
> I guess you'd have to pilot it, in case memory is an issue in your case 
> and
> not speed, you could try a regular multisearcher instead of a parallel
> multisearcher.
> I guess when you say maintenance of the DB gets easier, you mean that the
> data in each individual table is controlled (but remember there could be
> other bigger hassles like the one mentioned above about moving data 
> between
> indexes/DB).
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Mon, Oct 6, 2008 at 10:06 AM, Ganesh <emailgane@yahoo.co.in> wrote:
>
>> Hello Anshum,
>>
>> My index is growing 1 million documents per day. Initially i planned to
>> have a single database but the sorting of one or more fields consumes 
>> more
>> RAM. Whether sharding the index would also consume the same.
>>
>> My application should co-exist with other application of my product and 
>> my
>> app could get 1 GB of RAM. Search speed is fine but i need to display the
>> result in the sorted order.
>>
>> I thought to keep 7 days of documents in one index and create one more
>> after the 7 days. After 30 days the first index may get deleted. I need 
>> to
>> keep the documents in the index DB for 30 days. My Index DB is in HDD.
>>
>> I want to the pros and cons of sharding. I think maintance of the DB
>> becomes easier.
>>
>> It would be very much helpful, if you share some of your thoughts.
>>
>> Regards
>> Ganesh
>>
>>
>> ----- Original Message ----- From: "Anshum" <anshumg@gmail.com>
>> To: <java-user@lucene.apache.org>
>> Sent: Friday, October 03, 2008 9:48 PM
>> Subject: Re: Single searcher vs Multi Searcher
>>
>>
>>
>>  Hi Ganesh,
>>>
>>> I have experimented with sharded indexes and they seem to benefit
>>> me(atleast
>>> in my case). I would like to know a few things before I answer your
>>> question:
>>> 1. Do you have a reasonable criteria ( a calculated one) to shard the
>>> indexes?
>>> 2. How do you plan to split the index? Is it going to be document based
>>> (which I guess it should be as otherwise you would have to build a
>>> complete
>>> distributed system)
>>> 3. Do you plan to put your indexes on the RAM or on (physically) 
>>> seperate
>>> HDDs?
>>>
>>> Though all said and done, sharded indexes are a good approach, if done 
>>> the
>>> right way.
>>> --
>>> Anshum Gupta
>>> Naukri Labs!
>>> http://ai-cafe.blogspot.com
>>>
>>> The facts expressed here belong to everybody, the opinions to me. The
>>> distinction is yours to draw............
>>>
>>>
>>> On Fri, Oct 3, 2008 at 3:01 PM, Ganesh <emailgane@yahoo.co.in> wrote:
>>>
>>>  Hello all,
>>>>
>>>> My indexing is growing by 1 million records per day and the memory
>>>> consumption of the searcher object is quite high.
>>>>
>>>> There are different opinion in the groups. Few suggest to use single
>>>> database and few to use sharding. My Database has 10 million records 
>>>> now
>>>> and
>>>> it might go till 30 million or more. I plan to shard the index. but
>>>> Multisearcher will give me benifit.
>>>>
>>>> Regards
>>>> Ganesh
>>>>
>>>>
>>>> Send instant messages to your online friends
>>>> http://in.messenger.yahoo.com
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>> Send instant messages to your online friends 
>> http://in.messenger.yahoo.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message