lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rodrigo F Valverde <rodrigofvalve...@yahoo.com.br>
Subject Res: Res: How to search more than one word?
Date Thu, 24 May 2007 20:48:21 GMT
Yes guy, i have luke yet! :)

The words i used were: "maria" and "amanda".
The first word, is in one text file and the second is in the same one and another (so, two
files).

Changing the "IndexSearcher.search()" by "QueryParser.parse()" and keep everything equal,
all works fine.

By luke and by testing, i saw i could find the both words separatedly, but not them together!
Now, I can!

So, in this case, the problem was not the analyser, but that´s really a very good reminder!
;)

I found some thing I think speaks about the both. So, when I really find the difference, I
show you! :D

Thanks again! :D

----- Mensagem original ----
De: Erick Erickson <erickerickson@gmail.com>
Para: java-user@lucene.apache.org
Enviadas: Quinta-feira, 24 de Maio de 2007 17:18:38
Assunto: Re: Res: How to search more than one word?

If you haven't, I *strongly* recommend you get a copy of luke.
google lucene and luke to find it. It allows you to examine your
index and also to see how queries parse. It's invaluable.

I can't say exactly what the difference is, but there are
several possibilities. Note that in general it's best to use
the same analyzer during both index and query time.

One of the things an analyzer does is break up
the input stream into tokens that get passed
on. So, say you're using a WhitespaceAnalyzer and
indexing "this is a (silly) example". Your tokens would
be
this
is
a
(silly)
example

But StandardAnalyzer might (I'm not completely
sure about this, but this is the general idea)

this
is
a
silly
example

Note that the parens () were removed.

Now, if you construct your TermQuery with
"silly", it would match the StandardAnalyzer
at query time, but NOT the terms indexed with
WhitespaceAnalyzer. You'd get opposite matches
if your termquery were constructed with "(silly)"

So that's why I recommend you get a copy of Luke. I'm guessing
that when you indexed things with StandardAnalyzer your
stream was broken up differently than you expect. And when you
constructed your TermQuery, your term had some characters
that the StandardAnalyzer stripped form your original input
stream.

But that's just a guess. Luke will tell you for sure.

Best
Erick

On 5/24/07, Rodrigo F Valverde <rodrigofvalverde@yahoo.com.br> wrote:
>
> Hi again!
>
> That´s all diferent now!
>
> I´m no more using the "reader.search()"...now, i´m using the QueryParser:
> - QueryParser qp = new QueryParser("content", new StandardAnalyzer());
> - query = qp.parse(keyWordToSearch);
> now, it works fine! :D
>
> But now I need to know the diference between them! :)
>
> Thanks! :D
>
> ----- Mensagem original ----
> De: Rodrigo F Valverde <rodrigofvalverde@yahoo.com.br>
> Para: java-user@lucene.apache.org
> Enviadas: Quinta-feira, 24 de Maio de 2007 15:00:49
> Assunto: Res: How to search more than one word?
>
> I will try to resume the code:
>
> INDEX TIME
> - IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(),
> true);
> - writer.setUseCompoundFile(false);
> - while has files into the given dir...
> - Document doc = new Document();
> - doc.add(new Field("content", new FileReader(file)));
> - doc.add(new Field("filename", file.getPath(), Field.Store.YES,
> Field.Index.UN_TOKENIZED));
> - writer.addDocument(doc);
> - end while.
> - writer.optimize();
> - writer.close();
> SEARCH TIME
> - Directory directory = FSDirectory.getDirectory(indexDir, false);
> - IndexSearcher reader = new IndexSearcher(directory);
> - Hits hits = reader.search(new TermQuery(new Term
> ("content",keyWordToSearch)));
> - Iterator<Hit> i = hits.iterator();
> - while (i.hasNext()){
> - Hit hit = i.next();
> - Document d =(Document) hit.getDocument();
> - d.get("filename")
>
> And so, I take the name of the file where the word was found into to do
> what need to do...
> I can do when I use only one key word, but more than that, or, if I use
> some word than I know than is found with the operator "+", I can´t find
> that! :(
>
> So, for the Erick questions:
> 1- In particular, what analyzers you use at index and search time.
>     Answer: Standard, only at index time! That´s wrong?!
> 2- What the string was originally and how you indexed it.
>     Answer: I use html, htm and txt files! How I index, is above!
> 3- What query.toString() shows you.
>     Answer: I used no query! Only the reader.search()...
>
> If I write some thing wrong, I´m sorry... :P
>
> Thanks in advance! ;)
>
>
> ----- Mensagem original ----
> De: Erick Erickson <erickerickson@gmail.com>
> Para: java-user@lucene.apache.org
> Enviadas: Quinta-feira, 24 de Maio de 2007 13:36:12
> Assunto: Re: How to search more than one word?
>
> Not until you give us more information <G>.
>
> In particular, what analyzers you use at index and search time.
> What the string was originally and how you indexed it.
> What query.toString() shows you.
>
> Best
> Erick
>
> On 5/24/07, Rodrigo F Valverde <rodrigofvalverde@yahoo.com.br> wrote:
> >
> > Hi all!
> >
> > I implemented a searcher with Lucene and i´m trying to search two words,
> > the both into the same text file, but...i can´t!
> >
> > When I search the first word and the second separated, everithing
> happens
> > ok, but when together, with or wtithout "AND" or "+"...nothing is found!
> :(
> >
> > Can somebody help me?
> >
> >
> >
> > __________________________________________________
> > Fale com seus amigos  de graça com o novo Yahoo! Messenger
> > http://br.messenger.yahoo.com/
>
>
>
>
>
>
> __________________________________________________
> Fale com seus amigos  de graça com o novo Yahoo! Messenger
> http://br.messenger.yahoo.com/
>
>
>
>
>
> __________________________________________________
> Fale com seus amigos  de graça com o novo Yahoo! Messenger
> http://br.messenger.yahoo.com/






__________________________________________________
Fale com seus amigos  de graça com o novo Yahoo! Messenger 
http://br.messenger.yahoo.com/ 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message