lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Lucene newbee quesiton- Term Positions
Date Tue, 09 Oct 2007 13:02:27 GMT
I certainly applaud your effort to dig in and find out what's going on!

However, I suspect you'll get farther faster by trying one of several
tactics:
1> post the indexing code and the searching code in snippet form. This
    kind of issue is usually a problem with analyzers. That is, perhaps
    you're using one analyzer for indexing and a different one for
searching.
    Or you've made a typo in, say, the field name. Or.... Phrases certainly
    work for many people <G>.
2> Just let Luke reconstruct the document in question for you and inspect
     the reconstructed document. You can cut-n-paste the contents of a
     field into an editor and just search......
3> back out any complex analyzers you're using and just go with
     something like SimpleAnalyzer. Once that's working, work up
     from there. A unit test and/or small self-contained program
     will work well for you here.

Best
Erick

On 10/7/07, Developer Developer <devquestions@gmail.com> wrote:
>
> Hi Eric,
>
> Thanks for the quick reply.    My index does not return any hits when i
> search for certain phrases . I am very sure that the indexed documents
> does
> have those phrases in them.
>
> Therefore i want to just list all the terms and their postions for given
> document just to make sure that the indexed document does have those terms
> indexed in the correct order.
>
> I did check with luke and came up with the following code that does not
> seem
> to be working !!. positions.next()) returns flase !.  Do you see anything
> wrong in this code?
>
> Directory dir = FSDirectory.getDirectory(args[0]);
> IndexReader reader = IndexReader.open(dir);
> TermPositions positions = reader.termPositions();
>
> while(positions.next())
>   {
>      positions.nextPosition();
>
>      positions.nextPosition();
>      byte b[] = positions.getPayload(null, 0);
>      System.out.println(b);
>   }
>
>
>
>
>
> On 10/7/07, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> > I suspect that this is more work than you think, not to mention
> > very slow. This is just due to the nature of an inverted
> > index....
> >
> > To see what I mean, get a copy of Luke and have it
> > reconstruct one of your documents and you'll see what the
> > performance is like.
> >
> > I think Luke has all the example code you could ask for, that's
> > the place I'd look first. See:
> > http://lucene.apache.org/java/docs/contributions.html
> >
> > Why do you want to do this and is it really necessary? You
> > could think about storing the entire document, then when you
> > needed to count terms, just using one of the tokenizers and
> > counting them yourself....
> >
> > Best
> > Erick
> >
> > On 10/7/07, Developer Developer <devquestions@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > I have simple lucene 2.2 index created. I want to  list all the terms
> > and
> > > their positions in a document. how can I do it ?
> > >
> > > Can you please provide some sample code.
> > >
> > > Thanks !
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message