From java-user-return-64642-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Wed Oct 30 23:56:47 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4404E180654 for ; Thu, 31 Oct 2019 00:56:47 +0100 (CET) Received: (qmail 40712 invoked by uid 500); 30 Oct 2019 23:56:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40700 invoked by uid 99); 30 Oct 2019 23:56:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Oct 2019 23:56:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9E7E11A3244 for ; Wed, 30 Oct 2019 23:56:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.25 X-Spam-Level: X-Spam-Status: No, score=0.25 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gxLqaF4HbJoS for ; Wed, 30 Oct 2019 23:56:42 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::241; helo=mail-oi1-x241.google.com; envelope-from=kryptonics411@gmail.com; receiver= Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id BC7277DD8A for ; Wed, 30 Oct 2019 23:56:41 +0000 (UTC) Received: by mail-oi1-x241.google.com with SMTP id r27so3611522oij.7 for ; Wed, 30 Oct 2019 16:56:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=bm2Cy8KOGmx39Ik7cCj9NAf63qGtS29o/txWeGPhLmI=; b=G0qid3ZdKecCKZ/rYlSDL6Wucib0mTMy8GAVg0DMAuTWj32fKgwr8hcN98WY9e5VP4 GxM0snFXQD7vmet9Lgnf5+r9IvvZlyDImBiShiCkblftGcEIW0zgrAkUm0lfmhqzch+C jNZeVM3zLVi+PIl7awl2tHjccvqOib7bgDi6uQlOo5ZRFGmlS/fhS6PZBiKmx76LmcKb xBgi4okkQyhg+JL5MCLYLvuMisNe5szxfIeOUZ2O+sXz6f2A35SWgMxQNcp0NMYz/AeD I7CXyw5nxJFTYZaM1wHOp3e/8Y0iyvIB2X9L9jwVr9yKO4Xu/9znle/yB/jWx4QDicJ6 NSdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=bm2Cy8KOGmx39Ik7cCj9NAf63qGtS29o/txWeGPhLmI=; b=UaZCD5sbePiNfxykH5xqHa7IGiKzFK/DMAfIaiE+CzlWHA+2xtjf8nrH14QrYJeAif HSEINZAh+6e/8ucj/Ywv4oyrq4wdAtsw/z7SHGpwLXQ9JRO9ukqSHiHhZ8D+FhAVdD4L diOfUSIJzLbo3lBNJd6k+FbAkAlXcxL/nn1xy6xLc8X97avvQJvzXDcynCMu+Dj2084o Y0Lf7xJWJQ+VB9mWYm+Gh+aE4N2G+VudAtHdvzqHMJoZS3Rk5IJqlCqsC9r0mj5OVZzm Xn67uMdfXkBKIvkk2i3d15Kz8/9TkZm/yp+GI7pCt5eyOmbANFnjMbQYVet5L/W4AzFD vp2g== X-Gm-Message-State: APjAAAWsDU5vz55YH6NKMkwDSr830Hn8ejY8Rngbl78h21gUZeJ+F7/t Q3Y/RZCzyXWt7l53v2SGeNgL4vwjGL8oWCm+RsYv8w== X-Google-Smtp-Source: APXvYqzCVWrHqU5hkHBq3pBfCs1+iKID4ufQiQPCqws23vcG9jYbFVE//1TRdUyhcp2GvgdU1nHD6lNWwefFkikyhgs= X-Received: by 2002:aca:4d02:: with SMTP id a2mr1631423oib.139.1572479800027; Wed, 30 Oct 2019 16:56:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Matt Davis Date: Wed, 30 Oct 2019 19:56:28 -0400 Message-ID: Subject: Re: Iterating Over All Documents On a Changing Index To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary="000000000000e5aee90596297b39" --000000000000e5aee90596297b39 Content-Type: text/plain; charset="UTF-8" Thanks for the clarification. I have written my own logic tracking changes and ignoring documents that have been written or deleted since the reindex started. On Mon, Oct 21, 2019, 4:58 PM Adrien Grand wrote: > This is the right place to ask these questions indeed. > > This is a good way to iterate over documents. Regarding your 2nd > question, Lucene IndexReaders are point-in-time views of the data, so > changes won't become visible in-place. The tricky problem with this > kind of problem is usually to deal with documents that are getting > indexed after you pulled a new reader and while you are in the process > of reindexing. > > On Sat, Oct 19, 2019 at 1:35 AM Matt Davis > wrote: > > > > Hi All, > > > > I am working on implementing of an in place reindex using Lucene. In my > > case, I have BSON document stored in a binary field and have a set of > rules > > that pull fields out of the BSON and indexes them into different Lucene > > fields with different analyzers. I would like to be able to change these > > rules / schema and then iterate over the documents, indexing them using > the > > new schema. > > > > I have come up with the following code block: > > https://gist.github.com/mdavis95/f600e0a8233d0a1232eff77645d1dc8a > > > > I have two questions: > > 1) Is this a good way to iterate over the documents > > 2) How can I manage documents changing when I am doing this. New > documents > > coming in should be fine I believe but changes to existing documents > could > > be lost if I understand correctly. > > > > I hope that this is the right place to ask this question and I apologize > if > > this is obvious or has been asked and answered. > > > > Thanks, > > Matt > > > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --000000000000e5aee90596297b39--