lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: lucene hits vs topdocs
Date Mon, 21 Nov 2011 10:00:03 GMT
The general recommendation is to run the query again but you are right
that it isn't always the correct answer in all circumstances.  If you
want to guard against the scenario you outline, do it the way you
suggest,  That's fine.  In your fluid environment how do you cope when
doc #11 is no longer there when you move to page 2?  Do you worry
about missing new docs that won't appear in results because they
weren't there when the first search was executed?  Pros and cons to
all approaches.  If you are caching lucene docids be aware that they
can change. http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F

There is also something called search.After due in the next release of
Lucene.  See recent thread "Lucene pagination" on this list.


--
Ian.


On Sun, Nov 20, 2011 at 6:51 PM, Gwyn Carwardine <gwyn@carwardine.net> wrote:
> Hi
>
> I last used dotLucene 143 and now I'm wanting to upgrade to 294.
>
> What I've discovered is that there are quite a few changes..
>
> One of them is in respect of Search. Previously one supplied a query and
> received a number of hits. I didn't have an issue with preservation of state
> so was quite happy to page through the stored hits
>
> Now it has changed it also recommends passing the number of results required
> (as in top xx results) so I'm considering how to refactor my code.
>
> In the simplest way I guess I could retrieve all results as I did previously
> and then paginate through them, or I could use the re-querying approach. But
> this suggests for let's say 10 results per page that I query for 10 docs and
> then when the user scrolls to the next page I re-query for 20 docs and
> ignore the first 10 and so on and so forth.
>
> What initially strikes me about this is that in a fluid environment (where
> changes are constantly being re-indexed) it is possible that an item that
> would come in an number 11 on the first call (and hence not shown on page 1)
> would now move to number 10 on the second call (and hence not shown of page
> 2).
>
> I would expect as a user that if I do a query and then page through it then
> it is the same result set I am paging through and not one that could be
> constantly changing (especially if I am paging through a bit slowly).
>
> I am using Lucene as a text search within an information mgt product that
> does have lots of updates happening so this could well happen. And it only
> needs to happen once and someone miss a key bit of info for it to be
> embarrassing.
>
> So I'm curious as to how people out there actually do this. Yes holding
> state is a pain but I do that already.
>
> It just seems that Lucene is pointing towards "tell me how many" and so I
> really don't want to go against the tide (or it'll likely be painful next
> time I upgrade!)
>
> Thanks in advance
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message