lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liat oren <oren.l...@gmail.com>
Subject Re: IndexSearcher
Date Sun, 08 Mar 2009 08:14:53 GMT
Ok, thanks.

I will have to edit the code of Luke in order to add another analyzer,
right?

If I need to query a long text - for example to search for articles that are
textually close in their content, I need to parse to  the query the text of
the article. Then I get the error that it is too long. Is there any way to
solve it?

Thanks a lot,
Liat

2009/3/5 Erick Erickson <erickerickson@gmail.com>

> I think your root problem is that you're indexing UN_TOKENIZED, which
> means that the tokens you're adding to your index are NOT run through
> the analyzer.
>
> So your terms are exactly "111", "222 333" and "111 222 333", none of which
> match "222". I expect you wanted your tokens to be "111", "222", and "333",
> each appearing twice in your index.
>
> Try indexing them tokenized. Although note that I don't remember what
> StandardAnalyzer does with numbers. WhitespaceAnalyzer does the
> more intuitive thing, but beware that it doesn't fold case. But it might be
> an easier place for you to start until you get more comfortable with what
> various analyzers do.
>
> Also, I *strongly* advise that you get a copy of Luke. It is a wonderful
> tool
> that allows you to examine your index, analyze queries, test queries, etc.
>
> But be aware that the site that maintains Luke was having problems
> yesterday,
> look over the user list messages from yesterday if you have problems.
>
> Best
> Erick
>
> On Thu, Mar 5, 2009 at 8:40 AM, liat oren <oren.liat@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to do a search that will return documents that contain a
> given
> > word.
> > For example, I created the following index:
> >
> > IndexWriter writer = new IndexWriter("C:/TryIndex", new
> > StandardAnalyzer());
> > Document doc = new Document();
> >  doc.add(new Field(WordIndex.FIELD_WORLDS, "111 222 333",
> Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> > writer.addDocument(doc);
> > doc = new Document();
> > doc.add(new Field(WordIndex.FIELD_WORLDS, "111", Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> > writer.addDocument(doc);
> >  doc = new Document();
> >  doc.add(new Field(WordIndex.FIELD_WORLDS, "222 333", Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> >  writer.addDocument(doc);
> > writer.optimize();
> >  writer.close();
> >
> > now I want to get all the documents that contain the word "222".
> >
> > I tried to run  the following code but it doesn;t return any doc
> >
> >  IndexSearcher searcher = new IndexSearcher(indexPath);
> >
> > //  //  TermQuery mapQuery = new TermQuery(new Term(FIELD_WORLDS,
> > worldNum)); - this one also didn't word
> > Analyzer analyzer = new StandardAnalyzer();
> > QueryParser parser = new QueryParser(FIELD_WORLDS, analyzer);
> >  Query query = parser.parse(worldNum);
> >  Hits mapHits = searcher.search(query);
> >
> >
> > Thanks a lot,
> > Liat
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message