Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE36C1041E for ; Sun, 24 Nov 2013 13:32:14 +0000 (UTC) Received: (qmail 27590 invoked by uid 500); 24 Nov 2013 13:32:08 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 27343 invoked by uid 500); 24 Nov 2013 13:32:03 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 27335 invoked by uid 99); 24 Nov 2013 13:32:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Nov 2013 13:32:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.128.180 as permitted sender) Received: from [209.85.128.180] (HELO mail-ve0-f180.google.com) (209.85.128.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Nov 2013 13:31:57 +0000 Received: by mail-ve0-f180.google.com with SMTP id jz11so2225487veb.11 for ; Sun, 24 Nov 2013 05:31:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hXX26qFgwr0n0V6tZLOJp2zhyQxxXWOCRRaAt4xIVrw=; b=CII3qsFaLItENfDyxXL+9ttQXydY4ixWQV9unUCyBMbzYO7T0g5y18NTw7kNdw8csl gPidy8iTE56XA5YbLBL4ZBUvnYmgns66w0gzv3zmvb7A50C6bPtMxxPVT+RSRybKvzJo ZC/fAqM0HOpvXhVW47W6hiApNS3n6aICBTtvOiZz2Ixp9LkDoEhwQY6vutq60btusw4r sFK8tEXoEic21b/X0i0DQcIWLgNZ6XK4NES4D+b7QhFKR6K/1Yh5eb+ZOUoYzRx8EiOS N6CIXmz8FlVLt/1JvcTZDoCcelQPxU5emKYJN02pACyRAgOMdW04THUbqS89h1KLLf6p vOlA== MIME-Version: 1.0 X-Received: by 10.220.95.139 with SMTP id d11mr636685vcn.21.1385299896286; Sun, 24 Nov 2013 05:31:36 -0800 (PST) Received: by 10.52.171.78 with HTTP; Sun, 24 Nov 2013 05:31:36 -0800 (PST) In-Reply-To: References: Date: Sun, 24 Nov 2013 08:31:36 -0500 Message-ID: Subject: Re: building custom cache - using lucene docids From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a11c2ae7639c27004ebec4423 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2ae7639c27004ebec4423 Content-Type: text/plain; charset=ISO-8859-1 bq: Do i understand you correctly that when two segmets get merged, the docids (of the original segments) remain the same? The original segments are unchanged, segments are _never_ changed after they're closed. But they'll be thrown away. Say you have segment1 and segment2 that get merged into segment3. As soon as the last searcher that is looking at segment1 and segment2 is closed, those two segments will be deleted from your disk. But for any given doc, the docid in segment3 will very likely be different than it was in segment1 or 2. I think you're reading too much into LUCENE-2897. I'm pretty sure the segment in question is not available to you anyway before this rewrite is done, but freely admit I don't know much about it. You're probably going to get into the whole PerSegment family of operations, which is something I'm not all that familiar with so I'll leave explanations to others. On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla wrote: > Hi Erick, > Many thanks for the info. An additional question: > > Do i understand you correctly that when two segmets get merged, the docids > (of the original segments) remain the same? > > (unless, perhaps in situation, they were merged using the last index > segment which was opened for writing and where the docids could have > suddenly changed in a commit just before the merge) > > Yes, you guessed right that I am putting my code into the custom cache - so > it gets notified on index changes. I don't know yet how, but I think I can > find the way to the current active, opened (last) index segment. Which is > actively updated (as opposed to just being merged) -- so my definition of > 'not last ones' is: where docids don't change. I'd be grateful if someone > could spot any problem with such assumption. > > roman > > > > > On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson >wrote: > > > bq: But can I assume > > that docids in other segments (other than the last one) will be > relatively > > stable? > > > > Kinda. Maybe. Maybe not. It depends on how you define "other than the > > last one". > > > > The key is that the internal doc IDs may change when segments are > > merged. And old segments get merged. Doc IDs will _never_ change > > in a segment once it's closed (although as you note they may be > > marked as deleted). But that segment may be written to a new segment > > when merging and the internal ID for a given document in the new > > segment bears no relationship to internal ID in the old segment. > > > > BTW, I think you only really care when opening a new searchers. There is > > a UserCache (see solrconfig.xml) that gets notified when a new searcher > > is being opened to give it an opportunity to refresh itself, is that > > useful? > > > > As long as a searcher is open, it's guaranteed that nothing is changing. > > Hard commits with openSearcher=false don't open new searchers, which > > is why changes aren't visible until a softCommit or a hard commit with > > openSearcher=true despite the fact that the segments are closed. > > > > FWIW, > > Erick > > > > Best > > Erick > > > > > > > > On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla > > wrote: > > > > > Hi, > > > docids are 'ephemeral', but i'd still like to build a search cache with > > > them (they allow for the fastest joins). > > > > > > i'm seeing docids keep changing with updates (especially, in the last > > index > > > segment) - as per > > > https://issues.apache.org/jira/browse/LUCENE-2897 > > > > > > That would be fine, because i could build the cache from diff (of index > > > state) + reading the latest index segment in its entirety. But can I > > assume > > > that docids in other segments (other than the last one) will be > > relatively > > > stable? (ie. when an old doc is deleted, the docid is marked as > removed; > > > update doc = delete old & create a new docid)? > > > > > > thanks > > > > > > roman > > > > > > --001a11c2ae7639c27004ebec4423--