From java-user-return-64627-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon Oct 21 20:58:46 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id ACD2A180626 for ; Mon, 21 Oct 2019 22:58:46 +0200 (CEST) Received: (qmail 64276 invoked by uid 500); 21 Oct 2019 20:58:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 64263 invoked by uid 99); 21 Oct 2019 20:58:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Oct 2019 20:58:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B0ABAC0CEF for ; Mon, 21 Oct 2019 20:58:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.2 X-Spam-Level: X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id qhwqZHI3hdx9 for ; Mon, 21 Oct 2019 20:58:42 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::42a; helo=mail-pf1-x42a.google.com; envelope-from=jpountz@gmail.com; receiver= Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id CD8677F740 for ; Mon, 21 Oct 2019 20:58:41 +0000 (UTC) Received: by mail-pf1-x42a.google.com with SMTP id h195so9170240pfe.5 for ; Mon, 21 Oct 2019 13:58:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=2CtT57IrJl7jLH3/MIHNSlrklc+dv2KLJjztRWSNMA0=; b=Y3DpTQUNFkTCAmlRroO7txUuMlkVV2QjEDoSgbaVw8RQsDiZd6q0qZ1fi8SJ+sevS5 24dosgu+7r2BcZcOIGhLOhCDdmVOzDr8WkV2N6MViCybGltr/uIvr16sjWLnaPmyKBEF mF1N1v3+KyLbZCCl+lULW1zl5ex+JnCnCoWsn9rUaYP28jH/2LfuX6/gyOLVwyEY89pd R0CgBOrSVp+D2yxpMR/R6tQBg7jTLvxITUb+WNlGLmUp1fgzo3I62yvvIUopoLNe3LT5 ddQUBRXF1VS04P2maXohFYoibpbElvm+ijLMKjC/gjJO4rom398QP4X8dC5EQunNJzxO +lxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=2CtT57IrJl7jLH3/MIHNSlrklc+dv2KLJjztRWSNMA0=; b=WCnBjzuiErIsloCo9lAzQo70S87DNq8v4sX/lVViDlKfkwA1vrlkgFEHChgnvOpFA7 5dSWySMUMc1ZXE8Vw6VMuoUSyvmhQCh0DyeNadAc1DGlA7d4z1bCAbC9IsWGY9DIrK1i SxdodXwjqvd/RRKNSY1jwG1y/s1XxYb+52zC2kMCT8kRJjfOG8bMpOsDSwJws5H1vIx9 osbFtdKwYzLwse8vNI3QIRyRH0CT7QG+zvQ1vsKQcHsW1aBYKkHgYJIVcbyRKz5qS9R/ JBVR7/R/YWn03dCexUy2ValeldAdeVxN9ymbtsRbtWb/wIfwkyYY4bhzNX+8ABAKwt7g 8Ocg== X-Gm-Message-State: APjAAAVikqXDQ8IZNVzDtbKvqwz4MNPj3on4UIuZXWo1z21fHFhrdGvc jx0HStUfpO7QHp+WRYiJ1kLKn7WX1rh+ehQp5yYlWJKc X-Google-Smtp-Source: APXvYqw7eYulb+/ofCBuA7Me460YTrZ6foH06/gnlTY7U2LU9JE29+7/Ap7PoFn2n7sldk9EZCoLNmyNi4x/xjaAl/c= X-Received: by 2002:a17:90a:2ec1:: with SMTP id h1mr111073pjs.96.1571691513433; Mon, 21 Oct 2019 13:58:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Adrien Grand Date: Mon, 21 Oct 2019 22:58:22 +0200 Message-ID: Subject: Re: Iterating Over All Documents On a Changing Index To: Lucene Users Mailing List Content-Type: text/plain; charset="UTF-8" This is the right place to ask these questions indeed. This is a good way to iterate over documents. Regarding your 2nd question, Lucene IndexReaders are point-in-time views of the data, so changes won't become visible in-place. The tricky problem with this kind of problem is usually to deal with documents that are getting indexed after you pulled a new reader and while you are in the process of reindexing. On Sat, Oct 19, 2019 at 1:35 AM Matt Davis wrote: > > Hi All, > > I am working on implementing of an in place reindex using Lucene. In my > case, I have BSON document stored in a binary field and have a set of rules > that pull fields out of the BSON and indexes them into different Lucene > fields with different analyzers. I would like to be able to change these > rules / schema and then iterate over the documents, indexing them using the > new schema. > > I have come up with the following code block: > https://gist.github.com/mdavis95/f600e0a8233d0a1232eff77645d1dc8a > > I have two questions: > 1) Is this a good way to iterate over the documents > 2) How can I manage documents changing when I am doing this. New documents > coming in should be fine I believe but changes to existing documents could > be lost if I understand correctly. > > I hope that this is the right place to ask this question and I apologize if > this is obvious or has been asked and answered. > > Thanks, > Matt -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org