Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4F6BC200BAC for ; Wed, 12 Oct 2016 01:58:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4DCB7160AF3; Tue, 11 Oct 2016 23:58:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6BB05160AE6 for ; Wed, 12 Oct 2016 01:58:43 +0200 (CEST) Received: (qmail 29961 invoked by uid 500); 11 Oct 2016 23:58:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 29949 invoked by uid 99); 11 Oct 2016 23:58:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2016 23:58:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 81092C0829 for ; Tue, 11 Oct 2016 23:58:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=mikemccandless-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 9gH9jC3rdRFX for ; Tue, 11 Oct 2016 23:58:37 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id A81A95FAF7 for ; Tue, 11 Oct 2016 23:58:36 +0000 (UTC) Received: by mail-io0-f179.google.com with SMTP id i202so37691917ioi.2 for ; Tue, 11 Oct 2016 16:58:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mikemccandless-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=q0oV6IaUyY2VS7SssaM1SZKEAE3qgUwICStJ3kqsIQI=; b=zEnnzmyKAPAz050A1XsGxZLsoVfGESEITzg3a5NpL89KGKPgBd8ItnewGAPBPmyA+U 8RAuCx2ntfrH68HCgLsCThKvTfDJRO+iod20Ja2ucQ8wSEdYBkOUzHzLwnMlDNemQbYB SrYJ0XenJEdpcUCNve6hLeRttGu2UAz/ucgJwNHvld1XUZvp33nRh6ZS/lNokjSU9ETZ LqKpksTRRHftflkCiAwCBYNgIFqRW9NbOVWxZvM1njh26nnCqvIKD3F4TFpy+ESQXJKt V/0laKMeNjJey+rK8VRDihqe68aG3Ly/7jxizYFbBVNk2sVtrg4x2wLU9ZYruMmR0T3V DGtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=q0oV6IaUyY2VS7SssaM1SZKEAE3qgUwICStJ3kqsIQI=; b=Cao/Ykp1ClOnsmhVUrpgLQe1yO7REdj/0OQEbuYMGRbbVKjIoh6huIA/KCZaxC2f/j PYQsIyeAOtkDF3hTk399WoaXvzILnp1/tI6hgiEDrrZVXO/d9ENh2EuovC9Ijw1+UerH UKKqmZgubreB2vE7m3rcgyzBL2BwhTFXA/SamBUZn6dlQkKEueTkcN8T8rhqyHWbLdTx cTfQC74povcxBcS4GOw08OL9oREJCSICj7sALayDQ1OZZgouFsCKvfubVa83vYfMVRSQ 2aWGtcyCnxplb37lo18jNv9/mOTEOWsUlOuKOaU9BszkYJQoDxrYzr+fFaMl4U1GdfSN D9sw== X-Gm-Message-State: AA6/9Rl0QtY86MMiOMobMO2PpYoqFOnK8WrTp4c9sOuavCbF2AucROoOmcnWyr8DTlBX9jTAbCEWoaOTfgj19A== X-Received: by 10.107.12.139 with SMTP id 11mr529200iom.67.1476230309352; Tue, 11 Oct 2016 16:58:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.19.204 with HTTP; Tue, 11 Oct 2016 16:58:08 -0700 (PDT) In-Reply-To: References: From: Michael McCandless Date: Tue, 11 Oct 2016 19:58:08 -0400 Message-ID: Subject: Re: merge problems To: Hans Lund Cc: Lucene Users Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Tue, 11 Oct 2016 23:58:44 -0000 OK I have a small test case showing the issue! I opened https://issues.apache.org/jira/browse/LUCENE-7491 Thanks for reporting this, Hans. Mike McCandless http://blog.mikemccandless.com On Tue, Oct 11, 2016 at 12:08 PM, Hans Lund wrote: > hmm you're right - when it revealed a bug in our indexing code I stopped > wondering ;-) but now I tried to create small tests to show the behavior = - > until now without success. I'm pretty sure that I can reproduce it by > re-introducing our index bug, unfortunately it occurs after some hours > parsing and indexing wikipedia dumps - but from there I'll try simplifyin= g a > test reproducing the setup. > > The setup we use is quite forward using MMapDirectory and a NRT setup - t= he > only tailored functionality is our own IndexDeletionPolicy using an added > timestamp in userdata for the index commit keeping a number of snapshots = but > honoring a max retention period, not that I suspect it to be the cause - = but > if fieldinfos from another snapshot is used in the merge that could cause > problems > > Hans Lund > > On Tue, Oct 11, 2016 at 12:07 PM, Michael McCandless > wrote: >> >> Hmm, that should be "OK" from Lucene's standpoint. >> >> I mean, it should not result in strange merge exceptions later on. >> >> I think there's a bug somewhere in Lucene's efforts to pretend it's >> fully schema-less ... I'll try to reproduce this. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Tue, Oct 11, 2016 at 4:38 AM, Hans Lund wrote: >> > Turned out to be must much simpler - we had added a new 'dynamic' fiel= d >> > to >> > a stats doc a count on articles based on identified language code. >> > Having a >> > set of test documents in German, English, Swedish - no one had suspect= ed >> > the obvious that the language detection categorized a single document = as >> > being Indonesian, making the stats count id:1. >> > >> > I realized that the debug output I added - made output of everything >> > else >> > that the interesting field (iterating over already added fields - not >> > the >> > field causing the error later on ;-) >> > >> > >> > >> > >> > >> > On Mon, Oct 10, 2016 at 4:32 PM, Adrien Grand wrot= e: >> > >> >> It looks like the field infos of your index went out of sync with dat= a >> >> stored in the files about points. >> >> >> >> Can you run CheckIndex on your index (potentially with the `-fast` >> >> option >> >> so that it only verifies checksums)? It could be that one of these tw= o >> >> parts of the index got corrupted. >> >> >> >> Since you were able to modify the way add(IndexableField) is >> >> implemented, >> >> I'm wondering if you are running a fork of Lucene? If yes, maybe you >> >> did >> >> some changes that triggered this bug? >> >> >> >> Otherwise is your application: >> >> - using IndexWriter.addIndexes? >> >> - customizing merging in some way, eg. by wrapping the merge readers= ? >> >> >> >> Le mar. 4 oct. 2016 =C3=A0 16:40, Hans Lund a =C3= =A9crit : >> >> >> >> > After upgrading to 6.2 we are having problems during merges (after >> >> running >> >> > for a while). >> >> > >> >> > When the problem occurs its always complaining about the same field= - >> >> > and >> >> > throws: >> >> > >> >> > java.lang.IllegalArgumentException: field=3D"id" did not index poin= t >> >> values >> >> > at >> >> > >> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader= ( >> >> Lucene60PointsReader.java:126) >> >> > at >> >> > >> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsReader. >> >> size(Lucene60PointsReader.java:224) >> >> > at >> >> > >> >> > org.apache.lucene.codecs.lucene60.Lucene60PointsWriter. >> >> merge(Lucene60PointsWriter.java:169) >> >> > at >> >> > org.apache.lucene.index.SegmentMerger.mergePoints( >> >> SegmentMerger.java:173) >> >> > at org.apache.lucene.index.SegmentMerger.merge( >> >> SegmentMerger.java:122) >> >> > at >> >> > >> >> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:43= 12) >> >> > at >> >> > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889) >> >> > >> >> > >> >> > To figure out where we messed up - I have added some ugly logging t= o >> >> > Document: >> >> > >> >> > public final void add(IndexableField field) { >> >> > if ("id".equals(field.name()) && >> >> > field.fieldType().pointDimensionCount() >> >> > !=3D 0) { >> >> > System.err.println("Point value detected"); >> >> > for (IndexableField i : fields) { >> >> > System.err.println(i); >> >> > } >> >> > } >> >> > fields.add(field); >> >> > } >> >> > >> >> > In hope to intercept the document we messed up. >> >> > >> >> > But to my surprise toString on the suspected field just says >> >> > (contains a >> >> > URN): >> >> > >> >> > indexed,omitNorms,indexOptions=3DDOCS >> >> > >> >> > So any hints as to why field.fieldType().pointDimensionCount() !=3D= 0 >> >> > >> >> > and any suggestions what might cause this? >> >> > >> >> > Regards >> >> > Hans Lund >> >> > >> >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org