lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mcarcelen" <mcarce...@isoco.com>
Subject RE: Queries in Lucene
Date Wed, 13 Sep 2006 12:57:00 GMT
Thank you very much.

Yes, I´m very new to Lucene. I´m sorry

With the help of Lucene we want to classify 724.827 legal files that in the
first line contained the word "Auto" or "Providencia". We can to separate in
two groups. That´s why I´ve indexed these files with Lucene before, and we
thought that we could reused the index and apply a special query for that

Thanks for your help.
Best regards
Teresa


 


-----Mensaje original-----
De: Erick Erickson [mailto:erickerickson@gmail.com] 
Enviado el: miércoles, 13 de septiembre de 2006 14:20
Para: java-user@lucene.apache.org
Asunto: Re: Queries in Lucene

I'm assuming that you're new to Lucene, so if you're an old pro you probably
already know all this....

I think you'll have difficulty here. Lucene has no concept of lines, just
tokens and offsets. So here are a couple of suggestions off the top  of my
head...

If the first line is the *only* way you want to restrict this, index the
tokens in the first line in a separate field for each document, and search
on that field (call it "firstline" <G>). Obviously, this won't work for
searching lines 2-n.

If you're going to want to ask if terms are in line 2, 3, 4..., you could
bump your term position at the start of each line by, say, 500 and then do
some fancy dancing with TermPositions to get terms from a particular line.
This is going to be complicated though to get right, especially when you
want to do arbitrary boolean queries.

You could creatively index things. Index a document with fields line1,
line2, line3, line4...., and when you wanted to search in a particular line,
form your query with a field corresponding to the correct line. You could
even index the full text of the document in a "fulltext" field if you wanted
to search over an entire document.

There are space tradeoffs to all this, so be sure you understand
Field.Store.YES and NO as they apply to your problem, and what effect
analyzers have on your indexing AND search streams. Lots of people are
confused by this issue.

If you haven't already, get a copy of Luke so you can poke around at your
index. Google luke lucene and it'll pop right up.

Before diving into this as stated, is there a way to re-think the problem to
make it easier? What question are you *really* trying to answer by asking
whether certain tokens are in a particular line?

Best
Erick


On 9/13/06, mcarcelen <mcarcelen@isoco.com> wrote:
>
>
> Hi all,
> I´ve got a index and now I´m trying to create a query with lucene-2.0.0,
> I´d like to find files that in the first line get the following:
>
> <DIV class=My-Word1> AND Word2
>
> I´m tried with the package org.apache.lucene.demo.SearchFiles
> but I get files where the word "Word2" is not in the first line.
>
> I don´t know how to do the query filtered or if I have to use another file
>
> Can anyone help me?
>
> Thanks
>
> Best Regards
> Teresa
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message