lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Pendlebury <greg.pendleb...@gmail.com>
Subject Re: Solr 4.7.0 - cursorMark question
Date Fri, 07 Mar 2014 01:19:21 GMT
Thank-you, that all sounds great. My assumption about documents being
missed was something like this:

A,B,C,D

where they are sorted by timestamp first and ID second. Say the first
'page' of results is 'A,B', and before the second page is requested both
documents B + C receive update events and the new order (by timestamp) is:

A,D,B,C

In that situation D would always be missed, whether the cursorMark 'C or
greater' or 'greater than B' (I'm not sure which it is in practice), simply
because the cursorMark is the unique ID and the unique ID is not your first
sort mechanism.

However, I'm not really concerned about that anyway since it is not a use
case we consider important, and in an information science sense of things I
think it is a non-trivial problem to solve without brute force caching of
all result sets. I'm just happy that we don't have to get our users to
replace existing sort options; we just need to add a unique ID field at the
end and change the parameters we send into the cluster.

Thanks,
Greg


On 7 March 2014 11:05, Chris Hostetter <hossman_lucene@fucit.org> wrote:

>
> : At the end of the linked doco there is an example that doesn't make sense
> : to me, because it mentions "sort=timestamp asc" and is then followed by
> : pseudo code that sorts by id only. I understand that cursorMark requires
>
> Ok ... 2 things contributing to the confusion.
>
> 1) the para that refers to "sort=timestamp asc" should be fixed to include
> "id" as well.
>
> 2) psuedo-code you're refering to that uses "sort => 'id asc'" isn't ment
> to give an example of specifically tailing by timestamp -- it's an
> extension on the earlier example (of fetching all docs sorting on id) to
> show "tailing" new docs with new (increasing) ids ... i'll try to fix the
> wording to better elborate
>
> : that "sort clauses must include the uniqueKey field", but is it really
> just
> : 'include', or is it the only field that sort can be performed on?
> :
> : ie. can sort be specified as 'sort=timestamp asc, id asc'?
>
> That will absolutely work ... i'll update the doc to include more examples
> with multi-clause sort criteria.
>
> : I am assuming that if the index is changed between requests than we can
> : still 'miss' or duplicate documents by not sorting on the id as the only
> : sort parameter, but I can live with that scenario. cursorMark is still
>
> If you are using a timestamp param, you should never "miss" a document
> (assuming every doc gets a timestamp) but yes: you can absolutely get the
> same doc twice if it's updated after the first time you fetch it -- that's
> one of the advantages of sorting on a timestamp field like that.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message