From general-return-1418-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Sun Jun 07 18:48:23 2009 Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 56681 invoked from network); 7 Jun 2009 18:48:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Jun 2009 18:48:22 -0000 Received: (qmail 51992 invoked by uid 500); 7 Jun 2009 18:48:34 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 51968 invoked by uid 500); 7 Jun 2009 18:48:34 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 51958 invoked by uid 99); 7 Jun 2009 18:48:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Jun 2009 18:48:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Jun 2009 18:48:24 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1MDNPX-0000uf-Oq for general@lucene.apache.org; Sun, 07 Jun 2009 11:48:03 -0700 Message-ID: <23914028.post@talk.nabble.com> Date: Sun, 7 Jun 2009 11:48:03 -0700 (PDT) From: ywlee522 To: general@lucene.apache.org Subject: Re: How to structure lucene query? In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: ywlee522@gmail.com References: <23902784.post@talk.nabble.com> <23911598.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the comments. Apology for not providing details earlier. Users in my system generate reports of some type everyday. So a Lucene document has 4 fields; user name, report create_dt, report type, and repor= t text. For example, an analyst writes a report of telco market today, and may write a report of mobile phones in tomorrow. The query is "of the users who has one or more reports containing "ABC", find users who also has one or more reports containing "XYZ".=20 A user may have "ABC" in one report, and "XYZ" in another report, i.e., not in the same report. But this will match the query. =20 I first tried this in two searches: one searching "ABC" and collecting user names (going thru all results), and the second one searching "XYZ" among th= e users found in the first search. But this seems very inefficient, and not sure if this is the right use of Lucene. If I put all reports of a user into a single Lucene document, then it is equal to find all documents containing both "ABC" and "XYZ". But, then, i will lose the report_dt field, which is another parameter in the query. Simon Willnauer wrote: >=20 > could you please give us more details of you query or an example that > might help to understand what you are trying to do. I had the same > impression as Ted though. >=20 > simon >=20 > On Sun, Jun 7, 2009 at 4:28 PM, ywlee522 wrote: >> >> Thanks for the tip. =C2=A0But, no, it is not same as finding documents w= ith >> both >> "ABC" and "XYZ", as they can be appear in separate documents of the same >> user. >> >> >> >> >> Ted Dunning wrote: >>> >>> It is the same as finding documents with both "ABC" and "XYZ" except >>> that >>> you need to run over the results yourself and collect the user names. >>> >>> Lucene doesn't have a fancy query language so you can't magically do an= y >>> group-by or count(distinct) tricks. >>> >>> On Sat, Jun 6, 2009 at 8:59 AM, ywlee522 wrote: >>> >>>> >>>> >>>> A document has two fields; username, date, and document text. A user >>>> can >>>> have multiple documents. >>>> >>>> The query is: >>>> >>>> Of the users who have one or more documents with keyword "ABC", find >>>> users >>>> who also have one or more document with keyword "XYZ". >>>> >>>> This isn't finding documents with both "ABC" and "XYZ". =C2=A0 How can= this >>>> be >>>> done in lucene query? THANK YOU >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/How-to-structure-lucene-query--tp23902784p239027= 84.html >>>> Sent from the Lucene - General mailing list archive at Nabble.com. >>>> >>>> >>> >>> >>> -- >>> Ted Dunning, CTO >>> DeepDyve >>> >>> 111 West Evelyn Ave. Ste. 202 >>> Sunnyvale, CA 94086 >>> http://www.deepdyve.com >>> 858-414-0013 (m) >>> 408-773-0220 (fax) >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/How-to-structure-lucene-query--tp23902784p23911598= .html >> Sent from the Lucene - General mailing list archive at Nabble.com. >> >> >=20 >=20 --=20 View this message in context: http://www.nabble.com/How-to-structure-lucene= -query--tp23902784p23914028.html Sent from the Lucene - General mailing list archive at Nabble.com.