Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60021 invoked from network); 13 Sep 2006 12:57:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Sep 2006 12:57:14 -0000 Received: (qmail 55363 invoked by uid 500); 13 Sep 2006 12:57:09 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 54913 invoked by uid 500); 13 Sep 2006 12:57:08 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 54902 invoked by uid 99); 13 Sep 2006 12:57:08 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Sep 2006 05:57:08 -0700 Authentication-Results: idunn.apache.osuosl.org smtp.mail=mcarcelen@isoco.com; spf=permerror X-ASF-Spam-Status: No, hits=0.1 required=5.0 tests=FORGED_RCVD_HELO Received-SPF: error (idunn.apache.osuosl.org: domain isoco.com from 62.81.148.13 cause and error) Received: from ([62.81.148.13:37952] helo=smtp.isoco.com) by idunn.apache.osuosl.org (ecelerity 2.1 r(10620)) with ESMTP id A7/12-09463-E2008054 for ; Wed, 13 Sep 2006 05:57:20 -0700 Received: by smtp.isoco.com (Postfix-smc, from userid 65534) id AAB10108C88; Wed, 13 Sep 2006 15:01:11 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on kilimanjaro.isoco.net X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 Received: from mad.isoco.net (mad.isoco.net [172.18.0.11]) by smtp.isoco.com (Postfix-smc) with ESMTP id 80EDB108C85 for ; Wed, 13 Sep 2006 15:01:10 +0200 (CEST) Received: from mcarcelen (darek.mad.isoco.net [172.18.2.70]) by mad.isoco.net (Postfix) with ESMTP id AB8D54F5A2 for ; Wed, 13 Sep 2006 14:57:03 +0200 (CEST) From: "mcarcelen" To: Subject: RE: Queries in Lucene Date: Wed, 13 Sep 2006 14:57:00 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook, Build 11.0.5510 In-Reply-To: <359a92830609130520o4517a0abx9badfa30dcb94b0b@mail.gmail.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2962 Thread-Index: AcbXLwOSSbKjSQ8wRkuSeEqaO70NtAAA7ehg Message-Id: <20060913125703.AB8D54F5A2@mad.isoco.net> X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Thank you very much. Yes, I=B4m very new to Lucene. I=B4m sorry With the help of Lucene we want to classify 724.827 legal files that in = the first line contained the word "Auto" or "Providencia". We can to = separate in two groups. That=B4s why I=B4ve indexed these files with Lucene before, = and we thought that we could reused the index and apply a special query for = that Thanks for your help. Best regards Teresa =20 -----Mensaje original----- De: Erick Erickson [mailto:erickerickson@gmail.com]=20 Enviado el: mi=E9rcoles, 13 de septiembre de 2006 14:20 Para: java-user@lucene.apache.org Asunto: Re: Queries in Lucene I'm assuming that you're new to Lucene, so if you're an old pro you = probably already know all this.... I think you'll have difficulty here. Lucene has no concept of lines, = just tokens and offsets. So here are a couple of suggestions off the top of = my head... If the first line is the *only* way you want to restrict this, index the tokens in the first line in a separate field for each document, and = search on that field (call it "firstline" ). Obviously, this won't work for searching lines 2-n. If you're going to want to ask if terms are in line 2, 3, 4..., you = could bump your term position at the start of each line by, say, 500 and then = do some fancy dancing with TermPositions to get terms from a particular = line. This is going to be complicated though to get right, especially when you want to do arbitrary boolean queries. You could creatively index things. Index a document with fields line1, line2, line3, line4...., and when you wanted to search in a particular = line, form your query with a field corresponding to the correct line. You = could even index the full text of the document in a "fulltext" field if you = wanted to search over an entire document. There are space tradeoffs to all this, so be sure you understand Field.Store.YES and NO as they apply to your problem, and what effect analyzers have on your indexing AND search streams. Lots of people are confused by this issue. If you haven't already, get a copy of Luke so you can poke around at = your index. Google luke lucene and it'll pop right up. Before diving into this as stated, is there a way to re-think the = problem to make it easier? What question are you *really* trying to answer by = asking whether certain tokens are in a particular line? Best Erick On 9/13/06, mcarcelen wrote: > > > Hi all, > I=B4ve got a index and now I=B4m trying to create a query with = lucene-2.0.0, > I=B4d like to find files that in the first line get the following: > >
AND Word2 > > I=B4m tried with the package org.apache.lucene.demo.SearchFiles > but I get files where the word "Word2" is not in the first line. > > I don=B4t know how to do the query filtered or if I have to use = another file > > Can anyone help me? > > Thanks > > Best Regards > Teresa > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org