lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: looks like no allowing of paging without counting entire result set?
Date Mon, 20 Jun 2011 12:14:53 GMT
re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *]

About paging... Yes, you have to start all over again for each search. The basic
problem is that you have to score every document each search, the last document
scored might be the highest-scoring document.

But let's back up a step, can you tell us what the higher-level
problem you're trying
to solve is? *Why* do you want to do "deep paging"? Do you care about scoring
the documents or do you just want to look at all of them that match?

One solution would be to use a Collector that collected as many documents as
you ever want to return and then you can use that list to "page". But
that requires
a stateful connection, which may be appropriate to your problem...

Best
Erick

On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean  x66079
<dean.hiller@broadridge.com> wrote:
> "It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)"
>
> Bear with me as I am little confused so let me throw some stuff down here and think out
loud...
> So, I basically have to request the top 100, then do another request for the next 100,
etc. etc which seems like that would start all over from scratch and be a bit of a performance
hit correct???  I would think the optimal way would be search returns an object which maintains
a cursor into the index tree until I close it so I can keep asking for the next 100.  It
sounds like this new api doesn't do that?  And maybe the old one didn't either but from client
perspective, I thought the Hits object might actually just maintain that pointer.
>
> NOTE: I am not doing anything close to search.  Just basic column indexing like an RDBMS
would do for us except we don't have an RDBMS.  Our old RDBMS system has scaled up to being
too costly(3 terabytes).  We are now scaling out with noSQL and trying to replace the RDBMS
before the costs start to be more than the customers pay us.
>
> BIG NOTE: I think back to hibernate here where if you use select * from xx where yyy
and setMaxResults and setFirstPage(index), it gets slower and slower as you page further in,
BUT if you instead use the ScrollableResults, it maintains a cursor and the speed NEVER gets
slower as you page into the results.
>
> Maybe I am using the wrong library but there are a lot of noSQL users of Hbase starting
to use SOLR from what I understand.  Should I be using a different indexing library perhaps?
>
> Thanks,
> Dean
>
>
> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Sunday, June 19, 2011 12:16 PM
> To: java-user@lucene.apache.org
> Subject: RE: looks like no allowing of paging without counting entire result set?
>
>> I am wondering how the old Hits object worked that was deprecated and
>> removed....that looks like I could stop asking it for more results and it
> would
>> work better not counting all activities that matched in my 10 mil or 100
> mil
>> result set and just returning the first 100, second 100 and then I can cut
> off
>> which would be way more performant.
>
> Hits did exactly what you described before. It got as many results as needed
> to show the nth page. To when showing the page for results 20 to 30, it
> fetches at least 30 results.
>
> In general Full Text Search engines are only scoring the top results. This
> is e.g. one reason why Google limits the maximum page you can go to.
>
>> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?
>
> It supports it like 2.9, but not using the Hits API. As described above, to
> show results 991 to 1000 request the top-1000 results and display the last
> 10 :-)
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message