lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <emailg...@yahoo.co.in>
Subject Re: Single searcher vs Multi Searcher
Date Mon, 13 Oct 2008 12:36:38 GMT
Hello Anshum,

My criteria to shard would be date. I am planning to maintain 10 days of 
data in one DB. I think the memory required for single searcher and multi 
searcher would be more or less the same. In your case you may perform search 
on one DB but you might have all searcher objects of all DBs opened and 
warmed up and this will require same amount of memory as single searcher 
would do.

Let me conclude:
The benifits of going for sharding the database
1) increase in search time
2) increase in indexing speed
3) easy maintance.

Any other benifits?

Regards
Ganesh

----- Original Message ----- 
From: "Anshum" <anshumg@gmail.com>
To: <java-user@lucene.apache.org>
Sent: Friday, October 10, 2008 10:19 AM
Subject: Re: Single searcher vs Multi Searcher


> Hi Ganesh,
>
> Your situation seems pretty straight.
> I did not really split my database (storage), just that while indexing, I
> indexed the data into 'n' indexes (basis a particular field).
> Talking about implementation and gain, I have tried using multi searcher 
> as
> well as parallel multisearcher and I seem to be running searches faster 
> (my
> priority being a better time complexity as compared to space complexity) 
> in
> case of a parallelmultisearcher.
> This helps me perform my search in a little over 1/n times the regular
> search time.
> Also, there are times when I don't want to search on all the indexes and 
> so
> I end up only searching a select few indexes. I would advice you to shard
> the indexes keeping in mind the search criteria as well so that, if 
> possible
> you search for lesser data as opposed to searching for everything.
> Also, I did incrementally update my index and so deleted the older doc 
> from
> the older index and added it to a new incremental index (which was my
> concern as this takes a little time assuming you have a lot of updates as
> well).
>
> All said and done, this is an implementation of doc based index sharding
> i.e. everything for a particular document goes to a particular index. I 
> also
> tried splitting on terms (so that when I search for A and B, I know which
> indexes store all docs (and other info) for the terms A and B. But this
> would mean a lot of building things from scratch so I couldnt get to 
> really
> try it. There are a few distributed lucene implementations (though they 
> are
> not that good) you could have a look at them.
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Wed, Oct 8, 2008 at 5:39 PM, Ganesh <emailgane@yahoo.co.in> wrote:
>
>> Hello Anshum,
>>
>> In my case i have to add /modify records to the current index database 
>> and
>> there will be only delete in older index  DB. I will not have a 
>> situitation
>> of updating the older records.
>>
>> Please brief me about your requirement and how you configured your
>> database.
>> Whether you have any incremental updates??
>> What have gained by sharding the index.
>>
>> Regards
>> Ganesh
>>
>>
>> ----- Original Message ----- From: "Anshum" <anshumg@gmail.com>
>> To: <java-user@lucene.apache.org>
>> Sent: Tuesday, October 07, 2008 5:55 PM
>>
>> Subject: Re: Single searcher vs Multi Searcher
>>
>>
>>  There were all of the links I could find:
>>>
>>> http://findmeajob.wordpress.com/2007/07/31/lucene-singlesearcher-vs-multisearcher/
>>>
>>> http://archives.devshed.com/forums/java-118/how-do-lucene-ids-work-with-multireader-and-multisearcher-993682.html
>>>
>>> It has been only out of personal experimentation.
>>> Maintaining 4 indexes and having a multisearcher for the same sounds ok 
>>> to
>>> me, though it might(most probably would) use up more time (depending on
>>> your
>>> term/doc distribution).
>>> I have tried multisearchers for indexes that are sized at around 18Gs 
>>> with
>>> over 10 million records in them and they seem to be working fine. Memory
>>> utilization hasn't really been my problem though maintenance has been
>>> required to be taken care of if I had record updates as well i.e.
>>> Lets assume we have 4 indexes {1,2,3,4} with 4 being the oldest one. Now 
>>> a
>>> document got updated and so you would want to move it to the freshest
>>> index
>>> viz. index 1. In this case you would also have to delete this document
>>> from
>>> index 4 (after figuring out that it exists in 4) else this doc would 
>>> show
>>> up
>>> as 2 hits using a multisearcher.
>>> This is the document movement that I was talking about.
>>> Talking only about online resource utilization ( as what I just spoke
>>> about
>>> is generally an offline process), you would have to spend extra resource
>>> to
>>> merge the hits from the different indexes.
>>> The other overhead in case of a multisearcher would be of creating those
>>> many searchers - for each index(though shouldn't be much, just though
>>> would
>>> point out).
>>> I guess you should try it as speed of search is not realy all that
>>> important
>>> to you as compared to running it on a singe box within the memory
>>> limitation.
>>>
>>>
>>>
>>> --
>>> Anshum Gupta
>>> Naukri Labs!
>>> http://ai-cafe.blogspot.com
>>>
>>> The facts expressed here belong to everybody, the opinions to me. The
>>> distinction is yours to draw............
>>>
>>>
>>> On Tue, Oct 7, 2008 at 3:29 PM, Ganesh <emailgane@yahoo.co.in> wrote:
>>>
>>>  Hello Anusham,
>>>>
>>>> My intention is to shard the index after every 7 days (week). After 30
>>>> days, (4th  week) the first DB may get deleted. At any point of time i
>>>> will
>>>> be maitaining 3 to 4 DB.
>>>>
>>>> I want to know the pros and cons of the MultiSearcher or Index sharding
>>>> approach. Any web links would be helpful.
>>>>
>>>> Regards
>>>> Ganesh
>>>>
>>>> ----- Original Message ----- From: "Anshum" <anshumg@gmail.com>
>>>> To: <java-user@lucene.apache.org>
>>>> Sent: Tuesday, October 07, 2008 12:18 AM
>>>>
>>>> Subject: Re: Single searcher vs Multi Searcher
>>>>
>>>>
>>>>  Hi Ganesh,
>>>>
>>>>> About the memory consumption while sorting, it would end up using
>>>>> similar
>>>>> amounts, perhaps even more.. like in the case of regular parallel
>>>>> programming algorithms (hoping that you intend to search using a
>>>>> parallel
>>>>> multi searcher). Would you have to query particular indexes only for
a
>>>>> particular search or would you be searching over all the indexes and
>>>>> then
>>>>> follow it up by merger (which the parallel multi searcher would do
>>>>> efficiently).?
>>>>> Also, I guess 30 indexes would be a little too many, haven't really
>>>>> tried
>>>>> out those many indexes for a multisearcher.
>>>>> As far as maintenance of DB is concerned, it might be easy as long as
>>>>> you
>>>>> don't have any document updates, in which case you'd have to shift the
>>>>> documents from one DB/index to another (which includes creating an 
>>>>> entry
>>>>> in
>>>>> the latest index/DB and deleting the record from the older DB).
>>>>> I guess you'd have to pilot it, in case memory is an issue in your 
>>>>> case
>>>>> and
>>>>> not speed, you could try a regular multisearcher instead of a parallel
>>>>> multisearcher.
>>>>> I guess when you say maintenance of the DB gets easier, you mean that
>>>>> the
>>>>> data in each individual table is controlled (but remember there could

>>>>> be
>>>>> other bigger hassles like the one mentioned above about moving data
>>>>> between
>>>>> indexes/DB).
>>>>>
>>>>> --
>>>>> Anshum Gupta
>>>>> Naukri Labs!
>>>>> http://ai-cafe.blogspot.com
>>>>>
>>>>> The facts expressed here belong to everybody, the opinions to me. The
>>>>> distinction is yours to draw............
>>>>>
>>>>>
>>>>> On Mon, Oct 6, 2008 at 10:06 AM, Ganesh <emailgane@yahoo.co.in>
wrote:
>>>>>
>>>>>  Hello Anshum,
>>>>>
>>>>>>
>>>>>> My index is growing 1 million documents per day. Initially i planned

>>>>>> to
>>>>>> have a single database but the sorting of one or more fields consumes
>>>>>> more
>>>>>> RAM. Whether sharding the index would also consume the same.
>>>>>>
>>>>>> My application should co-exist with other application of my product

>>>>>> and
>>>>>> my
>>>>>> app could get 1 GB of RAM. Search speed is fine but i need to display
>>>>>> the
>>>>>> result in the sorted order.
>>>>>>
>>>>>> I thought to keep 7 days of documents in one index and create one

>>>>>> more
>>>>>> after the 7 days. After 30 days the first index may get deleted.
I 
>>>>>> need
>>>>>> to
>>>>>> keep the documents in the index DB for 30 days. My Index DB is in

>>>>>> HDD.
>>>>>>
>>>>>> I want to the pros and cons of sharding. I think maintance of the
DB
>>>>>> becomes easier.
>>>>>>
>>>>>> It would be very much helpful, if you share some of your thoughts.
>>>>>>
>>>>>> Regards
>>>>>> Ganesh
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----- From: "Anshum" <anshumg@gmail.com>
>>>>>> To: <java-user@lucene.apache.org>
>>>>>> Sent: Friday, October 03, 2008 9:48 PM
>>>>>> Subject: Re: Single searcher vs Multi Searcher
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Hi Ganesh,
>>>>>>
>>>>>>
>>>>>>> I have experimented with sharded indexes and they seem to benefit
>>>>>>> me(atleast
>>>>>>> in my case). I would like to know a few things before I answer
your
>>>>>>> question:
>>>>>>> 1. Do you have a reasonable criteria ( a calculated one) to shard

>>>>>>> the
>>>>>>> indexes?
>>>>>>> 2. How do you plan to split the index? Is it going to be document
>>>>>>> based
>>>>>>> (which I guess it should be as otherwise you would have to build
a
>>>>>>> complete
>>>>>>> distributed system)
>>>>>>> 3. Do you plan to put your indexes on the RAM or on (physically)
>>>>>>> seperate
>>>>>>> HDDs?
>>>>>>>
>>>>>>> Though all said and done, sharded indexes are a good approach,
if 
>>>>>>> done
>>>>>>> the
>>>>>>> right way.
>>>>>>> --
>>>>>>> Anshum Gupta
>>>>>>> Naukri Labs!
>>>>>>> http://ai-cafe.blogspot.com
>>>>>>>
>>>>>>> The facts expressed here belong to everybody, the opinions to
me. 
>>>>>>> The
>>>>>>> distinction is yours to draw............
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 3, 2008 at 3:01 PM, Ganesh <emailgane@yahoo.co.in>

>>>>>>> wrote:
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>
>>>>>>>> My indexing is growing by 1 million records per day and the
memory
>>>>>>>> consumption of the searcher object is quite high.
>>>>>>>>
>>>>>>>> There are different opinion in the groups. Few suggest to
use 
>>>>>>>> single
>>>>>>>> database and few to use sharding. My Database has 10 million

>>>>>>>> records
>>>>>>>> now
>>>>>>>> and
>>>>>>>> it might go till 30 million or more. I plan to shard the
index. but
>>>>>>>> Multisearcher will give me benifit.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Ganesh
>>>>>>>>
>>>>>>>>
>>>>>>>> Send instant messages to your online friends
>>>>>>>> http://in.messenger.yahoo.com
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   Send instant messages to your online friends
>>>>>>>
>>>>>> http://in.messenger.yahoo.com
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>  Send instant messages to your online friends
>>>> http://in.messenger.yahoo.com
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>> Send instant messages to your online friends 
>> http://in.messenger.yahoo.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message