Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D783114D9 for ; Tue, 19 Aug 2014 20:46:07 +0000 (UTC) Received: (qmail 43095 invoked by uid 500); 19 Aug 2014 20:45:59 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 43033 invoked by uid 500); 19 Aug 2014 20:45:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 42985 invoked by uid 99); 19 Aug 2014 20:45:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2014 20:45:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tmcao@me.com designates 17.158.161.2 as permitted sender) Received: from [17.158.161.2] (HELO nk11p00mm-asmtp003.mac.com) (17.158.161.2) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Aug 2014 20:45:32 +0000 Received: from nk11p00mm-spool001.mac.com ([17.158.161.66]) by nk11p00mm-asmtp003.mac.com (Oracle Communications Messaging Server 7u4-27.10(7.0.4.27.9) 64bit (built Jun 6 2014)) with ESMTP id <0NAK0019NMZN6H70@nk11p00mm-asmtp003.mac.com> for java-user@lucene.apache.org; Tue, 19 Aug 2014 20:45:30 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.12.52,1.0.27,0.0.0000 definitions=2014-08-19_05:2014-08-19,2014-08-19,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=1 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1408190240 MIME-version: 1.0 Content-type: multipart/alternative; boundary="Boundary_(ID_uX0lQuYCKeGry17mcIhwSQ)" Received: from localhost ([17.158.42.223]) by nk11p00mm-spool001.mac.com (Oracle Communications Messaging Server 7u4-27.08(7.0.4.27.7) 64bit (built Aug 22 2013)) with ESMTP id <0NAK00LO8MZPZ880@nk11p00mm-spool001.mac.com> for java-user@lucene.apache.org; Tue, 19 Aug 2014 20:45:25 +0000 (GMT) To: java-user@lucene.apache.org Cc: java-user@lucene.apache.org From: Tri Cao Subject: Re: Lucene Query Date: Tue, 19 Aug 2014 20:45:24 +0000 (GMT) X-Mailer: iCloud MailClient14E.142298 MailServer14E24.16380 X-Originating-IP: [12.252.199.138] Message-id: X-Virus-Checked: Checked by ClamAV on apache.org --Boundary_(ID_uX0lQuYCKeGry17mcIhwSQ) Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: quoted-printable Oh sorry guys, ignore what I said. I am going to get myself a coffee. Uwe = is absolutely correct here.=0A=0AOn Aug 19, 2014, at 01:13 PM, Uwe Schindl= er wrote:=0A=0AHi,=0ALook at his docs. He has only 2 doc= s, the second one 3 keywords.=0A=0AI would use a simple phrase query with = a slop value < Analyzers positionIncrementGap. This is the gap between fie= lds with same name. Span or phrase cannot cross the gap, if slop if small = enough, but large enough to find the terms next to each other.=0A=0ASpanQu= ery is not needed. Phrase does all thats needed. Slop is like edit distanc= e of whole terms, order does not matter.=0A=0AUwe=0A=0AAm 19. August 2014 = 22:05:23 MESZ, schrieb Tri Cao :=0A =A0 =A0 =A0 =A0>= OR operator does that, AND only returns docs with ALL terms present.=0A =A0= =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>Note that you have two options here=0A =A0= =A0 =A0 =A0>1. Create a BooleanQuery object (see the Java doc I linked be= low) and=0A =A0 =A0 =A0 =A0>programatically=0A =A0 =A0 =A0 =A0>add the ter= m queries with the following constraint:=0A =A0 =A0 =A0 =A0>http://lucene.= apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanClause.Occur.ht= ml#MUST_NOT=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>2. Use Lucene classic Q= ueryParser and pass in the query string "states=0A =A0 =A0 =A0 =A0>AND ame= rica AND united"=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>I would suggest 1)= if you are going to learn more about Lucene, and 2)=0A =A0 =A0 =A0 =A0>if= you are just want to get some thing out.=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0= =A0>Hope this helps,=0A =A0 =A0 =A0 =A0>Tri=0A =A0 =A0 =A0 =A0>=0A =A0 =A0= =A0 =A0>On Aug 19, 2014, at 12:17 PM, Jin Guang Zheng wrote:=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>Thanks for reply, = but won't BooleanQuery return both doc1 and doc2 with=0A =A0 =A0 =A0 =A0>q= uery:=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>label:States AND label:Americ= a AND label:United=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>Best,=0A =A0 =A0= =A0 =A0>Jin=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>On= Tue, Aug 19, 2014 at 2:07 PM, Tri Cao wrot= e:=0A =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>=A0 =A0 =A0 =A0 =A0> given that = example, the easy way is a boolean AND query of all=0A =A0 =A0 =A0 =A0>the= terms:=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0= =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>=A0 =A0 =A0 =A0 = =A0>=0A =A0 =A0 =A0 =A0>http://lucene.apache.org/core/4_6_0/core/org/apach= e/lucene/search/BooleanQuery.html=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>=A0 =A0 =A0 =A0 =A0> However, if your corp= us is more sophisticated you'll find that=0A =A0 =A0 =A0 =A0>relevance=0A = =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0> ranking is not always th= at trivial :)=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 = =A0 =A0 =A0>=A0 =A0 =A0 =A0 =A0> On Aug 19, 2014, at 11:00 AM, Jin Guang Z= heng =A0 =A0 =A0> wrote:=0A =A0 =A0 =A0= =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0> Hi,=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>= =0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0> I am wondering if so= meone can help me on this:=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0>=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0> I have index:=0A= =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0> =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0> doc 1 -- label: United States of America=0A = =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0> =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0> doc 2 -- label: United=0A =A0 =A0 =A0 =A0> =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0> doc 2 -- label: America=0A =A0 =A0 =A0 =A0> = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0> doc 2 -- label: States=0A =A0 =A0 =A0 =A0= > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0>=A0 =A0 =A0 =A0 =A0>= I am wondering how to generate a query with terms: states=0A =A0 =A0 =A0 = =A0>united america=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A= =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0> so only doc 1 returns.=0A= =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0> =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0> I was thinking SpanNearQuery, but can't make it work.=0A =A0 =A0 = =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0> Thanks,=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0> Jin=0A =A0 =A0 =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A =A0 =A0 = =A0 =A0> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>=0A=0A--=0AUwe Schindler=0AH.-H.-= Meier-Allee 63, 28213 Bremen=0Ahttp://www.thetaphi.de=0A= --Boundary_(ID_uX0lQuYCKeGry17mcIhwSQ) Content-type: multipart/related; boundary="Boundary_(ID_TptszXbTvPcAxIVrzaX7wA)"; type="text/html" --Boundary_(ID_TptszXbTvPcAxIVrzaX7wA) Content-type: text/html; CHARSET=US-ASCII Content-transfer-encoding: quoted-printable
Oh sorry guys, ignore what I said. I am going to get myself a coffee.= Uwe is absolutely correct here.

On Aug 19, 2014, at 01:13 PM, Uwe = Schindler <uwe@thetaphi.de> wrote:

Hi,
L= ook at his docs. He has only 2 docs, the second one 3 keywords.

I w= ould use a simple phrase query with a slop value < Analyzers positionIn= crementGap. This is the gap between fields with same name. Span or phrase = cannot cross the gap, if slop if small enough, but large enough to find th= e terms next to each other.

SpanQuery is not needed. Phrase does al= l thats needed. Slop is like edit distance of whole terms, order does not = matter.

Uwe

Am 19. August 2014 22:05:23 MESZ, schrieb Tri Ca= o <tmcao@me.com    >:
       >O= R operator does that, AND only returns docs with ALL terms present.
&n= bsp;      >
       >Note that= you have two options here
       >1. Create a = BooleanQuery object (see the Java doc I linked below) and
   = ;    >programatically
       >add = the term queries with the following constraint:
      &= nbsp;>http://lucene.apache.org/core/4_6_0/core/org/apache/lucen= e/search/BooleanClause.Occur.html#MUST_NOT
      &n= bsp;>
       >2. Use Lucene classic QueryPar= ser and pass in the query string "states
       &g= t;AND america AND united"
       >
  &= nbsp;    >I would suggest 1) if you are going to learn more a= bout Lucene, and 2)
       >if you are just wan= t to get some thing out.
       >
  &n= bsp;    >Hope this helps,
       >= Tri
       >
       >= ;On Aug 19, 2014, at 12:17 PM, Jin Guang Zheng <zhengj3@rpi.edu &= nbsp;    > wrote:
       >
&nb= sp;      >Thanks for reply, but won't BooleanQuery retur= n both doc1 and doc2 with
       >query:
&n= bsp;      >
       >label:Sta= tes AND label:America AND label:United
       >=
       >Best,
       &= gt;Jin
       >
       = >
       >On Tue, Aug 19, 2014 at 2:07 PM, T= ri Cao <tmcao@me.com        > wrote:
  =      >
       >    =      > given that example, the easy way is a boolean AND= query of all
       >the terms:
  &nb= sp;    >               &nb= sp;>
       >         &n= bsp;      >
       >  &n= bsp;      >
       >http://lucene.apache.org/core/4_6_= 0/core/org/apache/lucene/search/BooleanQuery.html
    &n= bsp;  >                >= ;
       >         >= ; However, if your corpus is more sophisticated you'll find that
 = ;      >relevance
       > &n= bsp;              > ranking is not a= lways that trivial :)
       >     &n= bsp;          >
       = ;>         > On Aug 19, 2014, at 11:00 AM, = Jin Guang Zheng <zhengj3@rpi.edu
       &= gt;     > wrote:
       > &nb= sp;              >
   =    >                = > Hi,
       >         &= nbsp;      >
       >   =              > I am wondering if som= eone can help me on this:
       >    = ;            >
      &= nbsp;>                > I ha= ve index:
       >         =        >
       >  =              > doc 1 -- label: Unit= ed States of America
       >     &nb= sp;          >
       = >                > doc 2 -- = label: United
       >       &nb= sp;        > doc 2 -- label: America
  &nb= sp;    >               &nb= sp;> doc 2 -- label: States
       >   =              >
    &nb= sp;  >         > I am wondering how to= generate a query with terms: states
       >un= ited america
       >       &nbs= p;        >
       > &nb= sp;              > so only doc 1 ret= urns.
       >         &nbs= p;      >
       >   &nb= sp;            >
     =  >                > I = was thinking SpanNearQuery, but can't make it work.
    &nbs= p;  >                ><= br>        >           &nb= sp;    > Thanks,
       >   &= nbsp;            > Jin
    =    >                &= gt;
       >          =      >

--
Uwe Schindler
H.-H.-Meier-Allee = 63, 28213 Bremen
http://www.thetaphi.de
= --Boundary_(ID_TptszXbTvPcAxIVrzaX7wA)-- --Boundary_(ID_uX0lQuYCKeGry17mcIhwSQ)--