Return-Path: X-Original-To: apmail-lucy-user-archive@www.apache.org Delivered-To: apmail-lucy-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE5F417407 for ; Thu, 19 Nov 2015 15:04:08 +0000 (UTC) Received: (qmail 11751 invoked by uid 500); 19 Nov 2015 15:04:08 -0000 Delivered-To: apmail-lucy-user-archive@lucy.apache.org Received: (qmail 11719 invoked by uid 500); 19 Nov 2015 15:04:08 -0000 Mailing-List: contact user-help@lucy.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@lucy.apache.org Delivered-To: mailing list user@lucy.apache.org Received: (qmail 11707 invoked by uid 99); 19 Nov 2015 15:04:08 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Nov 2015 15:04:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 178DF1A5AAB for ; Thu, 19 Nov 2015 15:04:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.231 X-Spam-Level: X-Spam-Status: No, score=0.231 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, KAM_LOTSOFHASH=0.25, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=rectangular-com.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id RxC57KBrVpLU for ; Thu, 19 Nov 2015 15:04:00 +0000 (UTC) Received: from mail-oi0-f45.google.com (mail-oi0-f45.google.com [209.85.218.45]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id DBD7623057 for ; Thu, 19 Nov 2015 15:03:59 +0000 (UTC) Received: by oixx65 with SMTP id x65so46265651oix.0 for ; Thu, 19 Nov 2015 07:03:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rectangular-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=jRzsMATHlqYOSW7x8h15YpY3TaI/Fe137Kxye/jdHk0=; b=Vx5As9MU9AplOiQo7DKNl6jh/1pzI/ba9Ts5u0W+bs4Xst7O679V6E68H0roBbYGcj vhFVDQDPSHqPXm/CLfwCw5AU1tGDu0UOfS1dcEDQLuk8bwwLXZ7JKnupZlgPrmzlhQh9 RpVOIiof/7vkY5PpBVwFDoSi86VMs1n9bOb2WE9Ol4yJU5ZZEcdDZSbopZLmgUlIP87E w+pD3NEF8FKNL1U7DYD+lANAwt5P14aXO4q9H6inw1pQ81F0ZUIubSyfXyGMgQCxkUKr fYim2vnnSN6LxEcfmFcfA16V8qalosXUW6oHPgCrXKthHnmoYKNCMosq/DaRgUJ49zG9 vLXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=jRzsMATHlqYOSW7x8h15YpY3TaI/Fe137Kxye/jdHk0=; b=aEX6AyUE+VcQDqFqZXA2d00dGO8hC4UfG89Eowig0vjouQfQTm94D7W/9oSqRUCJHk /m4CJVlb3/Ou1sTnVTK7x9nG4oZiABw1woLj39oPQYiojZzOOHTO70j37DIUSymXgX9b ZseqVXvwdikiiNpcMXt+hYZxgB7eD2NKpal9kQ9lM7bOICwrJDYOOHA2mEGflB8PLXjQ 6Jjp2eUTTDLyxZo5aoa8NKukys6HNbOQo7aF2ltC2tkHi0xOHdSTvzMDoTkbH573dYA5 WcqnHTRAiyqSwIJFhmliYrek0PZwRHh1p8vzs1YYemI7rTlBCa4o1/ZWqrBJC+ehNpK+ poBw== X-Gm-Message-State: ALoCoQk/YKlBW3J75sbCr9WBSGj0HDZuZlxBpNdNMhvMjZQn2euKlU/3IxkOeMhTAichEu0cyt2n MIME-Version: 1.0 X-Received: by 10.202.102.98 with SMTP id a95mr4980425oic.90.1447945439043; Thu, 19 Nov 2015 07:03:59 -0800 (PST) Received: by 10.182.49.130 with HTTP; Thu, 19 Nov 2015 07:03:58 -0800 (PST) X-Originating-IP: [99.46.94.139] In-Reply-To: References: <56228F3C.5090504@ecos.de> Date: Thu, 19 Nov 2015 07:03:58 -0800 Message-ID: From: Marvin Humphrey To: user@lucy.apache.org Content-Type: text/plain; charset=UTF-8 Subject: Re: [lucy-user] Strange results when documents gets delete while iterating On Thu, Nov 19, 2015 at 4:39 AM, Gerald Richter - ECOS Technology wrote: > Hi, > > It's a local IndexSearcher. > > I have done a lot of tests and it's really happening. > > Let me give you a little more details, maybe this helps: > > - I call a function that creates a new IndexSearcher and call $hits = $searcher -> hits. > - I iterate over the first few entries and returns the entries and the $hits > - The documents that were found are deleted from a database, which in turn deletes the documents from the Lucy index. > - Now I iterate over the next few entries and delete them and so on > > I have made small test where per iteration only two entries are fetch. The result looks like this: > > id => "8b8bce64e69b52ed244671009c11ee0e", > id => "8b8bce64e69b52ed244671009c4857e7", > id => "4a3dcd6c2e9e3074d2d52b8e72584b68", > id => "8b8bce64e69b52ed244671009c730dc9", > id => "4a3dcd6c2e9e3074d2d52b8e72584d19", > id => "8b8bce64e69b52ed244671009c7e3974", > id => "4a3dcd6c2e9e3074d2d52b8e72585475", > id => "8b8bce64e69b52ed244671009c7e4788", > id => "4a3dcd6c2e9e3074d2d52b8e72585dc2", > id => "8b8bce64e69b52ed244671009c7e2fa6", > > id is some value I store in the document. The result should only contain ids starting with 8. > > So you see the first two are correct, after deletion of this two (always in a different process), the next time, the first one I get is wrong the second one is correct... > > If I do not delete anything I only get the right entries (just commented out one line the rest is still the same). > > Any clue? When documents in an old segment are marked as deleted, that information is written to a bitmap deletions file which is written to a new segment. Old readers are not supposed to know about new segments. So for something to go wrong, either 1) information in an old segment would have to be corrupted, 2) a reader would have to somehow find out about information in a new segment, or 3) somthing else unrelated. Indexers write index data (including new deletions data referencing documents in old segments) to temp files in a new segment, which are then consolidated into a single per-segment "compound file" named "cf.dat". When a reader opens, it mmaps cf.dat for each segment in the snapshot. Once the reader successfully opens all the files it needs, it never goes looking for new files. It's hard to imagine a mechanism that would either cause an existing "cf.dat" file to be modified, or persuade a reader to go look at a new "cf.dat" file. So unless my reasoning is wrong, the cause is #3 -- something else unrelated. I really have no idea what that could be, though since you've previously asked some questions about Coro/AnyEvent and other concurrency stuff the most likely prospect would seem to be something unique to your setup. The next step is probably to take the behavior you've been able to reproduce and isolate it in a test case that others can run and analyze. Marvin Humphrey