Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4F1DD200C1C for ; Wed, 1 Feb 2017 02:53:15 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4DB39160B5F; Wed, 1 Feb 2017 01:53:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 95D3F160B52 for ; Wed, 1 Feb 2017 02:53:14 +0100 (CET) Received: (qmail 15774 invoked by uid 500); 1 Feb 2017 01:53:13 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 15761 invoked by uid 99); 1 Feb 2017 01:53:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Feb 2017 01:53:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9CBEBC0EA6 for ; Wed, 1 Feb 2017 01:53:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.379 X-Spam-Level: X-Spam-Status: No, score=0.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=trypticon.org Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id p7_071EtaZSk for ; Wed, 1 Feb 2017 01:53:08 +0000 (UTC) Received: from mail-ua0-f177.google.com (mail-ua0-f177.google.com [209.85.217.177]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 53D8D5F239 for ; Wed, 1 Feb 2017 01:53:08 +0000 (UTC) Received: by mail-ua0-f177.google.com with SMTP id 35so286609965uak.1 for ; Tue, 31 Jan 2017 17:53:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=trypticon.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=tgI9e7DgBzH5GSTrDgZz61WfV29WAbvCg+vm3mHtfz8=; b=KYe5phWfdpKBI1M+A2HXgTLQ/cDKPmtAapXzgAMDlpUbFwFnttPpP86ucnaQmCUnfW W5T0CWWgboX6v7ZxE9hNRFjtkXltFnKwtQxJaizpOVAHO4x3kfvQlIw57s+/DXC7cnhC 7BFeurzu3IRs39iw7qszUeU5ZAr/hyGseuOrySRugWXeXLnH+8ZgG/v4xJmfrjXh3vA0 vzWB/uLeSfE/64KMVqmllsBWdAiEzdmog+0v0mspLA+oMyWH70UeKyodIAEIJBS3eyI9 N1u4Pxt88TeFguCnkhBnxvHfaK3+vvJ0B40aH5ocKLwnOzycWXJlaVrK98WWifVqljCD j5qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=tgI9e7DgBzH5GSTrDgZz61WfV29WAbvCg+vm3mHtfz8=; b=kpTYrK9T/iRvnw/f3AT+k9Jcji/TyPHfj3uwoIRbAJwz44XlQbLOFK9Us282uA9hmi 3SZiRw3CJYqLiXo2zrKSWb6EIZbihYoVyT3aZSbvhaYPBEV1s+b4iCIUly19UIIldAAF bpOUFnnrnqJRXP1vFfGWNZldJ+DSi6ernb5AXmwNAgomUxQjUYNy3NXBvrOzETrIReTq +4weO+x7O29J9LxXWvitP3tUKiZsNt5llcE8d+WDHcyPN/5nLSFlSFgCyf1u8uUVzIxz O2JyZhlKQSMdgL4cE83udIfbtQJJ9dmXqrFI7xRnQkyUuENbm3cMdFVic5/Md0XPbrnb eWdg== X-Gm-Message-State: AIkVDXJali/r0T+ooJ0qydBBgGRCB7cGXsxLitNcRGEaaxn8dBwMlGzlZE7WKA2vTKYCLA== X-Received: by 10.159.36.180 with SMTP id 49mr180283uar.115.1485913982409; Tue, 31 Jan 2017 17:53:02 -0800 (PST) Received: from mail-vk0-f51.google.com (mail-vk0-f51.google.com. [209.85.213.51]) by smtp.gmail.com with ESMTPSA id 111sm6724443uar.18.2017.01.31.17.53.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Jan 2017 17:53:02 -0800 (PST) Received: by mail-vk0-f51.google.com with SMTP id k127so249751827vke.0 for ; Tue, 31 Jan 2017 17:53:01 -0800 (PST) X-Received: by 10.31.63.133 with SMTP id m127mr207044vka.42.1485913981789; Tue, 31 Jan 2017 17:53:01 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.90.222 with HTTP; Tue, 31 Jan 2017 17:53:01 -0800 (PST) In-Reply-To: <966444558.3112826.1485913137512@mail.yahoo.com> References: <966444558.3112826.1485913137512.ref@mail.yahoo.com> <966444558.3112826.1485913137512@mail.yahoo.com> From: Trejkaz Date: Wed, 1 Feb 2017 12:53:01 +1100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: How do I write in 3.x format to an upgradeded index using Lucene 4.10 To: Lucene Users Mailing List , kiwi clive Content-Type: text/plain; charset=UTF-8 archived-at: Wed, 01 Feb 2017 01:53:15 -0000 > If we take our old 3.x index and apply IndexUpgrader to it, we end up with a 4.10 index. > There are several lucene 4.x files created in the index directory and no errors are thrown. > However, it appears that the index data is still in the 3.x format, namely it remains: > "thanks", "coming" > and not: > "thanks", , "coming" Well, this is a different thing really. The index is in the 4.x format, but the analysis which was performed remains the 3.x analysis, because nothing was done to change the postings. So this whole thing is really just a "make sure to use the same analyser to query which you used to index" problem. So if you indexed using a Lucene 3 analyser, then you should be using the same v3 analyser when you query against the index in Lucene 4. So the usual rules apply: * Beware of Version.LATEST/LUCENE_CURRENT. Always use the exact version, and keep using it. * If Lucene remove support for some Version you were using, don't update the Version you're using. Instead, take a copy of the Tokenizer/TokenFilter you were using from the older version and port it to work on the new version. Maintain these frozen off analysis components forever. But that said, we didn't experience any problems like this from 3 to 4, but rather obscure problems where backwards compatibility was not maintained in Lucene itself, e.g. places where despite passing in a Version object, the older behaviour was not maintained. IIRC, the term length limits being changed was one of these. And in these situations, for the most part, freezing off a copy of the old behaviour works fine. That said, we don't use the "classic" query parser, but rather the flexible one. And maybe if you're using the classic one, it might have some misbehaviour around this which we didn't strike by using the flexible one. > So we need a way to write documents in 3.x format (no ), to our upgraded indexes, > new indexes can use native 4.10 format. It sounds like you just need to use the same analyser you were previously using, possibly forever... TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org