Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 18708 invoked from network); 17 Nov 2010 16:48:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Nov 2010 16:48:11 -0000 Received: (qmail 56852 invoked by uid 500); 17 Nov 2010 16:48:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 56592 invoked by uid 500); 17 Nov 2010 16:48:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 56583 invoked by uid 99); 17 Nov 2010 16:48:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Nov 2010 16:48:39 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fajerski@informatik.hu-berlin.de designates 141.20.20.51 as permitted sender) Received: from [141.20.20.51] (HELO mailslv1.informatik.hu-berlin.de) (141.20.20.51) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Nov 2010 16:48:33 +0000 Received: from [192.168.178.25] (p5B3F998C.dip0.t-ipconnect.de [91.63.153.140]) (authenticated bits=0) by mailslv1.informatik.hu-berlin.de (8.14.2+Sun/8.14.2/INF-2.0-MA-SOLARIS-2.10-25) with ESMTP id oAHGlnjF026363 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 17 Nov 2010 17:48:10 +0100 (CET) Subject: uncorrect results From: Jan To: java-user@lucene.apache.org Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-rmQFfg14PpTDNprB4OCN" Date: Wed, 17 Nov 2010 17:47:49 +0100 Message-ID: <1290012469.1798.23.camel@maschine> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (mailslv1.informatik.hu-berlin.de [141.20.20.51]); Wed, 17 Nov 2010 17:48:10 +0100 (CET) X-Virus-Scanned: clamav-milter 0.96.3 at sigma X-Virus-Status: Clean X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on condor X-Old-Spam-Status: No, score=3.5 required=5.0 tests=AWL,BAYES_00, RCVD_IN_BRBL_LASTEXT,RCVD_IN_PBL,SPF_NEUTRAL autolearn=no version=3.3.1 --=-rmQFfg14PpTDNprB4OCN Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, i have an assignment in my Text Analytics class. I am supposed to create an index and search it. The corpus is a PubMed-like XML file. it is possible to query terms (programcall a few terms) and phrases (programcall "a phrase").=20 When a phrase is queried the program should answer how often the phrase occured. The problem is, on certain queries the IndexSearcher returns some documents that do not have that particular query in its fields. I'd be delighted if someone could tell me what i am doing wrong. See the source code at my github repo https://github.com/jangingnicht/TextAnalytics2/tree/master/src/textanalytic= s2/ Thanks in advance jan PS: I use Lucene 3.0.2 and the OpenJDK Runtime Environment (IcedTea6 1.8.2) on an 64 bit Linux machine. --=-rmQFfg14PpTDNprB4OCN Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Dies ist ein digital signierter Nachrichtenteil -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEABECAAYFAkzkBzUACgkQ0FiIQaGp0Nlb2wCfYRJg7M/piOhhheZTIbbF8WLA 2GMAoJSTYnssiCatOLKp4Qz0gEJAD9uM =+N5X -----END PGP SIGNATURE----- --=-rmQFfg14PpTDNprB4OCN--