Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 31C93200B38 for ; Fri, 8 Jul 2016 18:12:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 30615160A5A; Fri, 8 Jul 2016 16:12:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 75CE7160A36 for ; Fri, 8 Jul 2016 18:12:47 +0200 (CEST) Received: (qmail 55137 invoked by uid 500); 8 Jul 2016 16:12:46 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55123 invoked by uid 99); 8 Jul 2016 16:12:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jul 2016 16:12:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 71B2B180614 for ; Fri, 8 Jul 2016 16:12:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.28 X-Spam-Level: * X-Spam-Status: No, score=1.28 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=fixflyer-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id cKqC2O-X-Hbx for ; Fri, 8 Jul 2016 16:12:44 +0000 (UTC) Received: from mail-qk0-f180.google.com (mail-qk0-f180.google.com [209.85.220.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0509C5FC34 for ; Fri, 8 Jul 2016 16:12:43 +0000 (UTC) Received: by mail-qk0-f180.google.com with SMTP id p74so6281865qka.0 for ; Fri, 08 Jul 2016 09:12:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fixflyer-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:mime-version:thread-index :content-language; bh=WbzMNc1eruD2wDL1a+wOLJzINF7mUB8uAueFr83eY28=; b=EUQNw8XlKqjV0k3SlPk69TKg2SVgjsOR5wXUVDvdFWiMy3ebZfFwGvXUpZjWp0IWpA U/pLlWedicIgKvg9xRGXwUYk2vqIvhvx5xQH3i9VqoCAkDooe58F9F1GBGz3SCamYWf9 Dt+8zU0k1vKHuwxP4G36jSkypfpz05jCKfagNUCeWXC8K6i7Di1Z3AjQ6rVdppEu5BUK 6Q3SDSJYpEUxaJSvrPr4j9EW2tARLRHGh4ZwrvuT9/dshW4U6buAL2Aa4X9iMonqMZ/N Xlnn3RNuVuW4z2k4GZh8jXflrEuHGNdLv6HcBrcrava6rclqS01nTgD+iAzcUyvZhYBu 4o5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :thread-index:content-language; bh=WbzMNc1eruD2wDL1a+wOLJzINF7mUB8uAueFr83eY28=; b=ati8YQ991tgaAp4yL0OFoLNSs7I2dLzg81dilhkwhYpzD9yfWNSwyLtHl94NwKT1jb Yn99Zt7YB+p3GlvaYhAVIIGmdbz+S0qQbKe1gItHvJYF7/LDo3aJ1dlwSU9bxIbj8tN8 llB55PEjcPwnVj0TGXZ7+yTdguD/GtoXiWwIBSWFVS3hiC2DlodOR6MLbfMG1VemZgJa eVmM/kOjAC7NWJz6GDZ7A6SgvN9dcoVkID9Y8Wictxd3ts+QMawU3EDklBHDbf0/eHU3 Np8j5Vw2lnVh7+ZPda3q4DjKA5jDdFCqN8YUrlp92ApgyWuGG8uX34rtDo8umQEPZuHk RllA== X-Gm-Message-State: ALyK8tJd/zHhG6IcvoJh4fDCdLhZCMUJ6tuh/AC5xIaY0aar8bEGaUQcJ3w/crts1i62Xg== X-Received: by 10.55.20.29 with SMTP id e29mr8019375qkh.30.1467994362777; Fri, 08 Jul 2016 09:12:42 -0700 (PDT) Received: from sgoldbergpc (static-71-190-148-155.nycmny.fios.verizon.net. [71.190.148.155]) by smtp.gmail.com with ESMTPSA id j1sm2600607qtj.22.2016.07.08.09.12.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 08 Jul 2016 09:12:42 -0700 (PDT) From: "Stuart Goldberg" To: Subject: Problems Refactoring a Lucene Index Date: Fri, 8 Jul 2016 12:12:41 -0400 Message-ID: <026f01d1d933$8e2601b0$aa720510$@fixflyer.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0270_01D1D912.07168490" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdHZMa6jQRAF4/AvRmm6Ap8UV0T57g== Content-Language: en-us archived-at: Fri, 08 Jul 2016 16:12:48 -0000 ------=_NextPart_000_0270_01D1D912.07168490 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit As our software goes through its lifecycle, we sometimes have to alter existing Lucene indexes. The way I have done that in the past is to open the existing index for reading, read each Document, modify it and write that Document to a new index. At the end of the process, I delete the old index and rename the new index to the old name. I do not do any tokenizing and use no analyzers. I recently upgraded from Lucene 3.x to 4.10.4. Now I have the following problem: Suppose the existing document has 10 fields in it and there's one I have to modify. I remove that field and re-add it with the new settings. Then I add the Document in its entirety to the new index. I run into the following problems: * I get Exceptions thrown for the fields I don't even touch. That's because their FieldType has 'tokenized' set to true and it fails because I am using no analyzers. 'tokenized' is set to true even though when I originally added the field to the original index I had 'tokenized' set to false! * I have LongFields that come back with 'indexed' set to false even though in the original index they were indexed! This makes the new index not searchable on these fields and hence unusable. * I can't even alter 'indexed' for these LongFields because for some reason the FieldType instance comes back frozen from the IndexReader. Once frozen, you can't alter it. Even if I create a new FieldType, there is no way to change the FieldType of a Field It seems the returned FieldType contents are kind of random! I did see in the Javadoc of IndexReader.document() that field metadata is not returned and that, in fact, that they should have new kind of object returned like 'StoredField' so there is no pretense of there being any metadata. I thought perhaps I could use FieldInfos. But that class returns the same bogus metadata. What then is the purpose of FieldInfos if the info is bogus? Am I not understanding something here? This is not very usable. What can I do to work around this? Is this a Lucene bug? Oversight? ------=_NextPart_000_0270_01D1D912.07168490--