lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: multiple small indexes or one big index?
Date Fri, 03 Jun 2011 20:59:44 GMT
OK, if they're all in a single index, you might also try using Lucene
sorting. Be aware
that the first sort on a field takes extra time to warm the caches...

But note that sorting is for single-valued, un-tokenized fields..

Best
Erick

On Fri, Jun 3, 2011 at 2:39 AM, Alexander Rosemann
<alexander.rosemann@gmail.com> wrote:
> Alright. With all the changes you suggested I am down from 9s to <1s. Again,
> many thanks to both of you Erick and Shai!
>
> Regards,
> Alex
>
> On 02.06.2011 15:48, Alexander Rosemann wrote:
>>
>> No worries, I'll keep that in mind now.
>> In addition I am going to switch to another collector as well. ATM I
>> collect the results and then sort them using the std. Collections.sort
>> approach... I have to look what Lucene offers and switch to something
>> else.
>>
>> Thanks,
>> Alex
>>
>> On 02.06.2011 15:36, Erick Erickson wrote:
>>>
>>> Sounds good, just be sure to keep your (now single) searcher open! Also,
>>> be sure to measure queries after a while. The first few queries will
>>> fill up
>>> caches etc, so the time should improve after the first few.
>>>
>>> Best
>>> Erick
>>>
>>> On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
>>> <alexander.rosemann@gmail.com> wrote:
>>>>
>>>> Hi Erick, caching the IndexSearchers didn't took too much effort and
>>>> decreased searching already by 30%!
>>>>
>>>> I am busy changing the code to use a single index as you suggested atm.
>>>> Still a few things left to be done but once I have it working I let
>>>> you know
>>>> how much faster it is for me.
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>> On 02.06.2011 13:04, Erick Erickson wrote:
>>>>>
>>>>> At this size, really consider going to a single index. The lack of
>>>>> administrative headaches alone is probably well worth the effort....
>>>>>
>>>>> I almost guarantee that the time you spend re-writing things to keep
>>>>> the searchers open (and finding the bugs!) will be far more than just
>>>>> putting all the data in a single index.
>>>>>
>>>>> But that might just be my preferences showing....
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
>>>>> <alexander.rosemann@gmail.com> wrote:
>>>>>>
>>>>>> Many thanks for the tips, Erick! I do close each searcher after a
>>>>>> search...
>>>>>> I will change that first thing tmrw. and let you know how that went.
>>>>>> Multi-threaded searching will be next and if that hasn't helped,
I
>>>>>> will
>>>>>> switch to one big index.
>>>>>> All indexes together are rather small, ~200MB and 50.000 documents.
>>>>>>
>>>>>> -Alex
>>>>>>
>>>>>> On 01.06.2011 23:26, Erick Erickson wrote:
>>>>>>>
>>>>>>> I'd start by putting them all in one index. There's no penalty
>>>>>>> in Lucene for having empty fields in a document, unlike an
>>>>>>> RDBMS.
>>>>>>>
>>>>>>> Alternately, if you're opening then closing searchers each
>>>>>>> time, that's very expensive. Could you open the searchers
>>>>>>> once and keep them open (all 90 of them)? That alone might
>>>>>>> do the trick and be less of a change to your program. You
>>>>>>> could also fire multiple threads at the searches, but check if
>>>>>>> you're CPU bound first (if you are, multiple threads won't
>>>>>>> help much/at all).
>>>>>>>
>>>>>>> You haven't said how big these indexes are nor how many
>>>>>>> documents you're talking about here, so this advice is suspect.
>>>>>>>
>>>>>>> Do look at putting it all in one index though, let us know if
you
>>>>>>> have some data indicating how big stuff is/would be.
>>>>>>>
>>>>>>> Best
>>>>>>> Erick
>>>>>>>
>>>>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>>>>>> <alexander.rosemann@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi all, I was wondering whether you could give me some advice
on
>>>>>>>> how to
>>>>>>>> improve my search performance.
>>>>>>>>
>>>>>>>> I have 90 lucene indexes, each having different fields (~5
per
>>>>>>>> Document).
>>>>>>>> When I search, I always have to go through all indexes to
build my
>>>>>>>> result
>>>>>>>> set. Searching one index takes approx. 100ms, thus searching
all
>>>>>>>> indexes
>>>>>>>> takes 9s in total.
>>>>>>>>
>>>>>>>> How can I reduce the time it needs to search?
>>>>>>>>
>>>>>>>> I decided to create this many indexes because putting all
data in
>>>>>>>> one
>>>>>>>> index
>>>>>>>> would mean that a document would have ~400 fields, with most
of them
>>>>>>>> left
>>>>>>>> empty. Is that ok? Would a single index be faster compared
to
>>>>>>>> multiple
>>>>>>>> small
>>>>>>>> ones?
>>>>>>>>
>>>>>>>> Any pointers are much appreciated.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message