lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: looks like no allowing of paging without counting entire result set?
Date Mon, 20 Jun 2011 14:12:56 GMT
<<< that if the first page took 3 seconds to come up, the second page
took 3 seconds + x seconds>>>

This is really suspicious, what all are you trying to do in your
process? Because I'm starting to guess
that Solr isn't the performance problem here, assuming
reasonably-sized pages (e.g. < thousands).

If all you're doing is matching terms, not scoring, using wildcards,
and all that, you might get
some joy from TermDocs or similar.

Best
Erick

On Mon, Jun 20, 2011 at 9:44 AM, Hiller, Dean  x66079
<dean.hiller@broadridge.com> wrote:
> One more note:  We hit a big performance problem in that if the first page took 3 seconds
to come up, the second page took 3 seconds + x seconds to come up....this was the major problem
we hit.  Our client is not a web app but automated software so the timings on the second
page really need to be in the 0 seconds + x seconds range.
>
> So, deep paging may happen if there are no matches in our system as the automated software
has to go through all results until it pairs up the record that just came in.
>
> Main issue is we have nothing to do with search and are trying to use lucene as a plain
indexing library for those typical rdbms indexing use-cases that you have.
>
> Dean
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Monday, June 20, 2011 6:15 AM
> To: java-user@lucene.apache.org
> Subject: Re: looks like no allowing of paging without counting entire result set?
>
> re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *]
>
> About paging... Yes, you have to start all over again for each search. The basic
> problem is that you have to score every document each search, the last document
> scored might be the highest-scoring document.
>
> But let's back up a step, can you tell us what the higher-level
> problem you're trying
> to solve is? *Why* do you want to do "deep paging"? Do you care about scoring
> the documents or do you just want to look at all of them that match?
>
> One solution would be to use a Collector that collected as many documents as
> you ever want to return and then you can use that list to "page". But
> that requires
> a stateful connection, which may be appropriate to your problem...
>
> Best
> Erick
>
> On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean  x66079
> <dean.hiller@broadridge.com> wrote:
>> "It supports it like 2.9, but not using the Hits API. As described above, to
>> show results 991 to 1000 request the top-1000 results and display the last
>> 10 :-)"
>>
>> Bear with me as I am little confused so let me throw some stuff down here and think
out loud...
>> So, I basically have to request the top 100, then do another request for the next
100, etc. etc which seems like that would start all over from scratch and be a bit of a performance
hit correct???  I would think the optimal way would be search returns an object which maintains
a cursor into the index tree until I close it so I can keep asking for the next 100.  It
sounds like this new api doesn't do that?  And maybe the old one didn't either but from client
perspective, I thought the Hits object might actually just maintain that pointer.
>>
>> NOTE: I am not doing anything close to search.  Just basic column indexing like
an RDBMS would do for us except we don't have an RDBMS.  Our old RDBMS system has scaled
up to being too costly(3 terabytes).  We are now scaling out with noSQL and trying to replace
the RDBMS before the costs start to be more than the customers pay us.
>>
>> BIG NOTE: I think back to hibernate here where if you use select * from xx where
yyy and setMaxResults and setFirstPage(index), it gets slower and slower as you page further
in, BUT if you instead use the ScrollableResults, it maintains a cursor and the speed NEVER
gets slower as you page into the results.
>>
>> Maybe I am using the wrong library but there are a lot of noSQL users of Hbase starting
to use SOLR from what I understand.  Should I be using a different indexing library perhaps?
>>
>> Thanks,
>> Dean
>>
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> Sent: Sunday, June 19, 2011 12:16 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: looks like no allowing of paging without counting entire result set?
>>
>>> I am wondering how the old Hits object worked that was deprecated and
>>> removed....that looks like I could stop asking it for more results and it
>> would
>>> work better not counting all activities that matched in my 10 mil or 100
>> mil
>>> result set and just returning the first 100, second 100 and then I can cut
>> off
>>> which would be way more performant.
>>
>> Hits did exactly what you described before. It got as many results as needed
>> to show the nth page. To when showing the page for results 20 to 30, it
>> fetches at least 30 results.
>>
>> In general Full Text Search engines are only scoring the top results. This
>> is e.g. one reason why Google limits the maximum page you can go to.
>>
>>> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?
>>
>> It supports it like 2.9, but not using the Hits API. As described above, to
>> show results 991 to 1000 request the top-1000 results and display the last
>> 10 :-)
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> This message and any attachments are intended only for the use of the addressee and
>> may contain information that is privileged and confidential. If the reader of the
>> message is not the intended recipient or an authorized representative of the
>> intended recipient, you are hereby notified that any dissemination of this
>> communication is strictly prohibited. If you have received this communication in
>> error, please notify us immediately by e-mail and delete the message and any
>> attachments from your system.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message