Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E8F24830 for ; Sun, 15 May 2011 15:58:57 +0000 (UTC) Received: (qmail 51892 invoked by uid 500); 15 May 2011 15:58:54 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 51844 invoked by uid 500); 15 May 2011 15:58:54 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 51836 invoked by uid 99); 15 May 2011 15:58:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 May 2011 15:58:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 May 2011 15:58:49 +0000 Received: by wyb40 with SMTP id 40so4396280wyb.35 for ; Sun, 15 May 2011 08:58:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.11.17 with SMTP id r17mr1392169wbr.26.1305475107616; Sun, 15 May 2011 08:58:27 -0700 (PDT) Received: by 10.227.24.11 with HTTP; Sun, 15 May 2011 08:58:27 -0700 (PDT) In-Reply-To: References: <1304554890814-2901542.post@n3.nabble.com> Date: Sun, 15 May 2011 11:58:27 -0400 Message-ID: Subject: Re: why query chinese character with bracket become phrase query by default? From: Michael McCandless To: solr-user@lucene.apache.org, yonik@lucidimagination.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I opened https://issues.apache.org/jira/browse/SOLR-2519 for this. Mike http://blog.mikemccandless.com On Sun, May 15, 2011 at 8:02 AM, Michael McCandless wrote: > On Fri, May 6, 2011 at 8:49 AM, Michael McCandless > wrote: > >> Shouldn't we =A0have field types in the eg schema for the different >> languages? =A0Ie, text_zh, text_th, text_en, text_ja, text_nl, etc. > > In fact, until we break out dedicated language field types, shouldn't > we default autophrase to off in Solr? > > I think this is what ElasticSearch does (just inherits Lucene's > default for this) -- Shay, or any ElasticSearch users out there... can > you confirm? > > Leaving autophrase on is catastrophic for non-whitespace languages > (CJK and others), and at best iffy for whitespace languages (ie, > unexpected that the QueryParser would make a PhraseQuery when user > hadn't asked for one, not clear it really helps relevance for > whitespace languages, definitely hurts performance), so leaving it is > doing far more damage than good, as far as I can tell. > > Any objections to turning off autophrase by default in Solr, until we > have per-language field types? > > Mike > > http://blog.mikemccandless.com >