Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of nyadav.ait@gmail.com
 designates 209.85.214.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKUJFg280_RQRjVfbb7LEp3HadsoFVYp3zgD=PpOy43Gp5USyg@mail.gmail.com>
References: 
 <CAKUJFg3ucxcemsPF9NDafqyBW0neVLGLAJ5trx_6ZtmhtB6=SA@mail.gmail.com>
 <73A5261B-B690-4B95-A6CD-7DAF5F5A6AB4@c6-intelligence.com>
 <CAKUJFg2060w+Q+kD8FPMNX0325qdSuDZ=ipVDCQOcNmecVh3AA@mail.gmail.com>
 <2E6A89A648463A4EBF093A9062C1668305A857EDF69D@SBMAILBOX1.sb.statsbiblioteket.dk>
 <CAKUJFg1An16zrSaT+M8LMZBvXT=xrTuZDqaKwFjcTLSYs2qTMw@mail.gmail.com>
 <54BBE0C4.5060504@safaribooksonline.com>
 <CAKUJFg280_RQRjVfbb7LEp3HadsoFVYp3zgD=PpOy43Gp5USyg@mail.gmail.com>
From: Naresh Yadav <nyadav.ait@gmail.com>
Date: Mon, 19 Jan 2015 19:47:31 +0530
Message-ID: 
 <CAKUJFg06r-ydFbb4sbwbQXFNni8Qpb+sfSuXjCmspzxhqvbqgQ@mail.gmail.com>
Subject: Re: Need Debug Direction on Performance Problem
To: solr-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=001a1140a7f2d3cf5f050d01fce8

--001a1140a7f2d3cf5f050d01fce8
Content-Type: text/plain; charset=UTF-8

Michael, i tried your idea of implementing own cursor in solr 4.6.1 itself
but some how that testcase was taking huge time.
Then i tried Cursor approach by upgrading solr to 4.10.3. With that got
better results. For Setup 2 now time reduced from
114 minutes to 18 minutes but still little far from Setup1 i.e 2 minutes.
Actually first 50 thousand request it self is taking about a minute. May be
i would need to see other things as pagination seems working better now.

thanks for giving valuable suggestions.

On Mon, Jan 19, 2015 at 11:20 AM, Naresh Yadav <nyadav.ait@gmail.com> wrote:

> Toke, won't be able to use TermsComponent as i had complex filter criteria
> on other fields.
>
> Michael, i understood your idea of paging without using start=,
> will prototype it as it is possible in my usecase also and post here
> results i got with this approach.
>
>
> On Sun, Jan 18, 2015 at 10:05 PM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>
>> You can also implement your own cursor easily enough if you have a unique
>> sortkey (not relevance score). Say you can sort by id, then you select
>> batch 1 (50k docs, say) and record the last (maximum) id in the batch.  For
>> the next batch, limit it to id > last_id and get the first 50k docs (don't
>> use start= for paging).  This scales much better when scanning a large
>> result set; you'll get constant time across the whole set instead of having
>> it increase as you page deeper.
>>
>> -Mike
>>
>>
>> On 1/18/2015 7:45 AM, Naresh Yadav wrote:
>>
>>> Hi Toke,
>>>
>>> Thanks for sharing solr internal's for my problem. I will definitely try
>>> Cursor also but only problem is my current
>>> solr version is 4.6.1 in which i guess cursor support is not there. Any
>>> other option i have for this problem ??
>>>
>>> Also as per your suggestion i will try to avoid regional units in post.
>>>
>>> Thanks
>>> Naresh
>>>
>>> On Sun, Jan 18, 2015 at 4:19 PM, Toke Eskildsen <te@statsbiblioteket.dk>
>>> wrote:
>>>
>>>  Naresh Yadav [nyadav.ait@gmail.com] wrote:
>>>>
>>>>> In both setups, we are reading in batches of 50k and each batch taking
>>>>> Setup1  : approx 7 seconds and for completing all batches of total 10
>>>>>
>>>> lakh
>>>>
>>>>> results takes 1 to 2 minutes.
>>>>> Setup2 : approx 2-3 minutes and for completing all batches of total 10
>>>>>
>>>> lakh
>>>>
>>>>> results  takes 114 minutes.
>>>>>
>>>> Deep paging across shards without cursors means that for each request,
>>>> the
>>>> full result set up to that point must be requested from each shard. The
>>>> deeper your page, the longer it takes for each request. If you only
>>>> extracted 500K results instead of the 1M in setup 2, it would likely
>>>> take a
>>>> lot less than 114/2 minutes.
>>>>
>>>> Since you are exporting the full result set, you should be using a
>>>> cursor:
>>>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>>>> This should make your extraction linear to the number of documents and
>>>> hopefully a lot faster than your current setup.
>>>>
>>>> Also, please refrain from using regional units such as "lakh" in an
>>>> international forum. It requires some readers (me for example) to
>>>> perform a
>>>> search in order to be sure on what you are talking about.
>>>>
>>>> - Toke Eskildsen
>>>>
>>>>
>>
>
>
>
>

--001a1140a7f2d3cf5f050d01fce8--