Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 455BA102F0 for ; Fri, 7 Mar 2014 01:19:59 +0000 (UTC) Received: (qmail 58109 invoked by uid 500); 7 Mar 2014 01:19:51 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 57942 invoked by uid 500); 7 Mar 2014 01:19:50 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 57815 invoked by uid 99); 7 Mar 2014 01:19:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 01:19:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of greg.pendlebury@gmail.com designates 74.125.82.179 as permitted sender) Received: from [74.125.82.179] (HELO mail-we0-f179.google.com) (74.125.82.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 01:19:42 +0000 Received: by mail-we0-f179.google.com with SMTP id x48so4035615wes.24 for ; Thu, 06 Mar 2014 17:19:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=36JrakiQlFrDmp4YxaO0I00vzZ478Bhln1TpzjTfIdM=; b=pxhv8P4iKAFS8XFFa5UaPq/JywYg5ddgTfs0N9FyMB7O/BojA8Sdr2yHrpy6lJy1+8 RVBihAjVeXOt+/n1swUCSh7MtuvrN46Ho9mF/SaxSF/qbMcmxmG+Au5fgXUiZdK6PS0I AJmzDipFeYOb40OfTWzxZxfGhGIt6gaPXB9AvFgIAsU+kj/yPckTEp42yPaawDFW1q8D Uibqx3xeb54qxig3rpptWYoPmecJpDvEK3vzrDWzMFBNbZgl6hjAzfrXs/MMjSV9tXEF 43Eb8GNWqPdIhmWYyDEdsQXMqtfldm0uEvpgrCgBB4WNphl/dQI7+bOBbUNoe6lmAv0s Jstg== MIME-Version: 1.0 X-Received: by 10.194.234.106 with SMTP id ud10mr15185175wjc.0.1394155161870; Thu, 06 Mar 2014 17:19:21 -0800 (PST) Received: by 10.194.137.165 with HTTP; Thu, 6 Mar 2014 17:19:21 -0800 (PST) In-Reply-To: References: Date: Fri, 7 Mar 2014 12:19:21 +1100 Message-ID: Subject: Re: Solr 4.7.0 - cursorMark question From: Greg Pendlebury To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=089e014940782f659804f3fa0ba3 X-Virus-Checked: Checked by ClamAV on apache.org --089e014940782f659804f3fa0ba3 Content-Type: text/plain; charset=ISO-8859-1 Thank-you, that all sounds great. My assumption about documents being missed was something like this: A,B,C,D where they are sorted by timestamp first and ID second. Say the first 'page' of results is 'A,B', and before the second page is requested both documents B + C receive update events and the new order (by timestamp) is: A,D,B,C In that situation D would always be missed, whether the cursorMark 'C or greater' or 'greater than B' (I'm not sure which it is in practice), simply because the cursorMark is the unique ID and the unique ID is not your first sort mechanism. However, I'm not really concerned about that anyway since it is not a use case we consider important, and in an information science sense of things I think it is a non-trivial problem to solve without brute force caching of all result sets. I'm just happy that we don't have to get our users to replace existing sort options; we just need to add a unique ID field at the end and change the parameters we send into the cluster. Thanks, Greg On 7 March 2014 11:05, Chris Hostetter wrote: > > : At the end of the linked doco there is an example that doesn't make sense > : to me, because it mentions "sort=timestamp asc" and is then followed by > : pseudo code that sorts by id only. I understand that cursorMark requires > > Ok ... 2 things contributing to the confusion. > > 1) the para that refers to "sort=timestamp asc" should be fixed to include > "id" as well. > > 2) psuedo-code you're refering to that uses "sort => 'id asc'" isn't ment > to give an example of specifically tailing by timestamp -- it's an > extension on the earlier example (of fetching all docs sorting on id) to > show "tailing" new docs with new (increasing) ids ... i'll try to fix the > wording to better elborate > > : that "sort clauses must include the uniqueKey field", but is it really > just > : 'include', or is it the only field that sort can be performed on? > : > : ie. can sort be specified as 'sort=timestamp asc, id asc'? > > That will absolutely work ... i'll update the doc to include more examples > with multi-clause sort criteria. > > : I am assuming that if the index is changed between requests than we can > : still 'miss' or duplicate documents by not sorting on the id as the only > : sort parameter, but I can live with that scenario. cursorMark is still > > If you are using a timestamp param, you should never "miss" a document > (assuming every doc gets a timestamp) but yes: you can absolutely get the > same doc twice if it's updated after the first time you fetch it -- that's > one of the advantages of sorting on a timestamp field like that. > > > > -Hoss > http://www.lucidworks.com/ > --089e014940782f659804f3fa0ba3--