Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 71434 invoked from network); 5 Apr 2010 18:50:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Apr 2010 18:50:32 -0000 Received: (qmail 77968 invoked by uid 500); 5 Apr 2010 18:50:30 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 77928 invoked by uid 500); 5 Apr 2010 18:50:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 77920 invoked by uid 99); 5 Apr 2010 18:50:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Apr 2010 18:50:30 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=AWL,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [85.25.71.29] (HELO mail.troja.net) (85.25.71.29) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Apr 2010 18:50:24 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.troja.net (Postfix) with ESMTP id 20353D36004 for ; Mon, 5 Apr 2010 20:50:01 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.troja.net Received: from mail.troja.net ([127.0.0.1]) by localhost (megaira.troja.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LclzOlj1zD5F for ; Mon, 5 Apr 2010 20:49:56 +0200 (CEST) Received: from VEGA (port-83-236-62-54.dynamic.qsc.de [83.236.62.54]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.troja.net (Postfix) with ESMTPSA id 84ACAD36003 for ; Mon, 5 Apr 2010 20:49:55 +0200 (CEST) From: "Uwe Schindler" To: References: <279092301003302206n67a51f1fkf9d493f725f745d0@mail.gmail.com> In-Reply-To: Subject: RE: fastest way to gather simple terms that match documents? Date: Mon, 5 Apr 2010 20:49:56 +0200 Message-ID: <002601cad4f0$c95ceb70$5c16c250$@de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcrU7Uj3T7F7nQIvTe2AMNI2+O2axQAAu4yg Content-Language: de Alternatively index your documents with term vectors for the field = enabled: http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/document/Fi= eld.TermVector.html And then use IndexReader.getTermFreqVector() with the matching doc ID: http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/Index= Reader.html#getTermFreqVector(int, java.lang.String) Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Chris Hostetter [mailto:hossman_lucene@fucit.org] > Sent: Monday, April 05, 2010 8:24 PM > To: java-user@lucene.apache.org > Subject: Re: fastest way to gather simple terms that match documents? >=20 >=20 > : After I've run a query I need to know which terms matched each > : result document (ie doc termfrequency>0). > ... > : I don't care how many were found or what position or anything else. > : just which ones matched. >=20 > if all you care about is simple "which terms does it have" you can = take > your list of terms, and your list of docids, sort both lists and then > use > termDocs to loop over the terms and over the docs. (the sorting is = key > for performance, because it allways you to alwasy skip forward, w/o > needing to restart the termDocs) >=20 > something like... >=20 > TermDocs iter =3D indexReader.termDocs(); > for (Term t : myTerms) { > iter.seek(t); > for (int docid : myDocs) { > if (iter.skipTo(docid) && (iter.doc() =3D=3D docid)) { > doSomethingWith(t, docid); > } > } > } >=20 >=20 >=20 > -Hoss >=20 >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org