lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: IndexSearcher
Date Sun, 08 Mar 2009 16:30:19 GMT
What Shashi said.....

On Sun, Mar 8, 2009 at 10:22 AM, Shashi Kant <skant@sloan.mit.edu> wrote:

> Liat, i think what Erick suggested was to use the TOKENIZED setting instead
> of UN_TOKENIZED. For example your code should read something like:
>
> Document doc = new Document();
> doc.add(new Field(WordIndex.FIELD_WORLDS, "111 222 333", Field.Store.YES, *
> Field.Index.TOKENIZED*));
>
> Unless I am missing something , you do not need to modify Luke's code-base.
> You could however use it to open your index.
>
> Shashi
>
>
>
> On Sun, Mar 8, 2009 at 10:09 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > In Luke, you can select from among most of the analyzers provided by
> > Lucene, there's a drop-down on the search tab. And I'm sure you can
> > add your own custom ones, but I confess I've never tried.
> >
> > Best
> > Erick
> >
> > On Sun, Mar 8, 2009 at 4:14 AM, liat oren <oren.liat@gmail.com> wrote:
> >
> > > Ok, thanks.
> > >
> > > I will have to edit the code of Luke in order to add another analyzer,
> > > right?
> > >
> > > If I need to query a long text - for example to search for articles
> that
> > > are
> > > textually close in their content, I need to parse to  the query the
> text
> > of
> > > the article. Then I get the error that it is too long. Is there any way
> > to
> > > solve it?
> > >
> > > Thanks a lot,
> > > Liat
> > >
> > > 2009/3/5 Erick Erickson <erickerickson@gmail.com>
> > >
> > > > I think your root problem is that you're indexing UN_TOKENIZED, which
> > > > means that the tokens you're adding to your index are NOT run through
> > > > the analyzer.
> > > >
> > > > So your terms are exactly "111", "222 333" and "111 222 333", none of
> > > which
> > > > match "222". I expect you wanted your tokens to be "111", "222", and
> > > "333",
> > > > each appearing twice in your index.
> > > >
> > > > Try indexing them tokenized. Although note that I don't remember what
> > > > StandardAnalyzer does with numbers. WhitespaceAnalyzer does the
> > > > more intuitive thing, but beware that it doesn't fold case. But it
> > might
> > > be
> > > > an easier place for you to start until you get more comfortable with
> > what
> > > > various analyzers do.
> > > >
> > > > Also, I *strongly* advise that you get a copy of Luke. It is a
> > wonderful
> > > > tool
> > > > that allows you to examine your index, analyze queries, test queries,
> > > etc.
> > > >
> > > > But be aware that the site that maintains Luke was having problems
> > > > yesterday,
> > > > look over the user list messages from yesterday if you have problems.
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On Thu, Mar 5, 2009 at 8:40 AM, liat oren <oren.liat@gmail.com>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to do a search that will return documents that contain
> a
> > > > given
> > > > > word.
> > > > > For example, I created the following index:
> > > > >
> > > > > IndexWriter writer = new IndexWriter("C:/TryIndex", new
> > > > > StandardAnalyzer());
> > > > > Document doc = new Document();
> > > > >  doc.add(new Field(WordIndex.FIELD_WORLDS, "111 222 333",
> > > > Field.Store.YES,
> > > > > Field.Index.UN_TOKENIZED));
> > > > > writer.addDocument(doc);
> > > > > doc = new Document();
> > > > > doc.add(new Field(WordIndex.FIELD_WORLDS, "111", Field.Store.YES,
> > > > > Field.Index.UN_TOKENIZED));
> > > > > writer.addDocument(doc);
> > > > >  doc = new Document();
> > > > >  doc.add(new Field(WordIndex.FIELD_WORLDS, "222 333",
> > Field.Store.YES,
> > > > > Field.Index.UN_TOKENIZED));
> > > > >  writer.addDocument(doc);
> > > > > writer.optimize();
> > > > >  writer.close();
> > > > >
> > > > > now I want to get all the documents that contain the word "222".
> > > > >
> > > > > I tried to run  the following code but it doesn;t return any doc
> > > > >
> > > > >  IndexSearcher searcher = new IndexSearcher(indexPath);
> > > > >
> > > > > //  //  TermQuery mapQuery = new TermQuery(new Term(FIELD_WORLDS,
> > > > > worldNum)); - this one also didn't word
> > > > > Analyzer analyzer = new StandardAnalyzer();
> > > > > QueryParser parser = new QueryParser(FIELD_WORLDS, analyzer);
> > > > >  Query query = parser.parse(worldNum);
> > > > >  Hits mapHits = searcher.search(query);
> > > > >
> > > > >
> > > > > Thanks a lot,
> > > > > Liat
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message