Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 61DF29E9B for ; Mon, 2 Jul 2012 17:12:53 +0000 (UTC) Received: (qmail 40915 invoked by uid 500); 2 Jul 2012 17:12:50 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 40867 invoked by uid 500); 2 Jul 2012 17:12:50 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 40859 invoked by uid 99); 2 Jul 2012 17:12:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 17:12:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 17:12:43 +0000 Received: by qadz32 with SMTP id z32so2226545qad.14 for ; Mon, 02 Jul 2012 10:12:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=HvEAhLodyoymD4ZL8X6tw0ScK0Rn3mPoWlsjKs+0Z18=; b=FVve3fy8HP/9QMdSf0j91OxxfUSvnEMy+l+4AJZpIw8ZTewceMOlRSsODRKYH40ZZu vTlT3LoPETj4gO6uqQefDqbtJ04p8taAOJKfbgRsmPHOyGogzMv2PEPzlcrY7iQikxrE LBZGNKkRHKLsGWOwvoaifrhjxzFWkXloSVUN9IcCAjOqC7hHEKeH3A27K4ohA8ZsG3Iw 4sMoxEjBOt8NsqADkolxdHprg44Fddb9JA1TSlsQ7wMSfoHpTgYZCFjuduvYSw1xaPlq mKXI65YXXqr489Stq3roGOsOCggxMk56z4Mzmtb+XE3wmkCf3nLuSEZIT2Dqm5ekrEqF hThw== MIME-Version: 1.0 Received: by 10.224.205.195 with SMTP id fr3mr24458144qab.68.1341249142229; Mon, 02 Jul 2012 10:12:22 -0700 (PDT) Received: by 10.224.47.68 with HTTP; Mon, 2 Jul 2012 10:12:22 -0700 (PDT) In-Reply-To: References: Date: Mon, 2 Jul 2012 13:12:22 -0400 Message-ID: Subject: Re: edismax parser ignores mm parameter when tokenizer splits tokens (hypenated words, WDF splitting etc) From: Tom Burton-West To: solr-user@lucene.apache.org Cc: William Dueber Content-Type: multipart/alternative; boundary=20cf300faa77ad8c5a04c3dbe6e4 X-Gm-Message-State: ALoCoQmfv1COaXCcbTKq2uVwsOSMBNvwZGv/0QIeGiHIIfF4N+QujPZm1XfpmIye0kpxhi0pt0Po --20cf300faa77ad8c5a04c3dbe6e4 Content-Type: text/plain; charset=ISO-8859-1 Opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-3589, which also lists a couple other related mailing list posts. On Thu, Jun 28, 2012 at 12:18 PM, Tom Burton-West wrote: > Hello, > > My previous e-mail with a CJK example has received no replies. I > verified that this problem also occurs for English. For example in the > case of the word "fire-fly" , The ICUTokenizer and the WordDelimeterFilter > both split this into two tokens "fire" and "fly". > > With an edismax query and a must match of 2 : q={!edsmax mm=2} if the > words are entered separately at [fire fly], the edismax parser honors the > mm parameter and does the equivalent of a Boolean AND query. However if > the words are entered as a hypenated word [fire-fly], the tokenizer splits > these into two tokens "fire" and "fly" and the edismax parser does the > equivalent of a Boolean OR query. > > I'm not sure I understand the output of the debugQuery, but judging by the > number of hits returned it appears that edismax is not honoring the mm > parameter. Am I missing something, or is this a bug? > > I'd like to file a JIRA issue, but want to find out if I am missing > something here. > > Details of several queries are appended below. > > Tom Burton-West > > edismax query mm=2 query with hypenated word [fire-fly] > > > {!edismax mm=2}fire-fly > {!edismax mm=2}fire-fly > +DisjunctionMaxQuery(((ocr:fire ocr:fly))) > +((ocr:fire ocr:fly)) > > > Entered as separate words [fire fly] numFound="184962 > edismax mm=2 > > {!edismax mm=2}fire fly > {!edismax mm=2}fire fly > > +((DisjunctionMaxQuery((ocr:fire)) DisjunctionMaxQuery((ocr:fly)))~2) > > > Regular Boolean AND query: [fire AND fly] numFound="184962 > fire AND fly > fire AND fly > +ocr:fire +ocr:fly > +ocr:fire +ocr:fly > > Regular Boolean OR query: fire OR fly 366047 numFound="366047" > > fire OR fly > fire OR fly > ocr:fire ocr:fly > ocr:fire ocr:fly > --20cf300faa77ad8c5a04c3dbe6e4--