lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Retrieving TermVectors from a Field over the full index?
Date Sun, 10 Jun 2007 20:00:07 GMT
Maybe I'm missing the boat, but I don't understand why TermEnum doesn't
work for you.
Try something like...

       TermEnum te = this.reader.terms();
        te.skipTo(new Term("keyword", ""));
        while (te.next()) {
            Term term = te.term();
            if (! term.field().equals("keyword")) {
                break;
            }
            System.out.println(term.text());
        }



On 6/10/07, Benjamin Pasero <bpasero@rssowl.org> wrote:
>
>
>
> Erick Erickson wrote:
> > Um, to return all counts of all terms in a field, what other option
> > *is* there except to walk the whole thing?
> >
> > Have you looked at TermEnum, TermDocs, and TermFreqVector?
> > For that matter, TermPositionVector might also be of some use.
> >
> > It would be easier to provide some help if you
> > 1> mentioned what you'd tried already
> > 2> mentioned what's inadequate about what you've tried.
> Sorry for not being clear what I am trying to achieve. I am storing
> documents in my index that are made of 5 Fields. One of the Fields
> contains keywords that describe the document. Now, I need a fast
> way of retrieving these keywords together with their frequency from
> the index.
>
> My current solution is to use IndexReader#terms() to walk over all
> terms and count the ones that appear in the keyword-Field.
>
> As you can assume, this is not scaling well. The content in the keywords
> field is usually quite small, however, the other fields may store
> up to thousands of terms.
>
> What I am asking for is a way to walk all the terms of just the
> keyword-field
> in order to avoid having to walk all terms in all fields.
>
> Of course, even better would be some API that would return a TermVector
> from
> the keyword-field. But I guess TermVectors are only supported on a per
> Document level and not index level?
>
> Regards,
> Ben
> >
> > Best
> > Erick
> >
> > On 6/9/07, Benjamin Pasero <bpasero@rssowl.org> wrote:
> >>
> >> Hi,
> >>
> >> I wonder if this is possible:
> >>
> >> Return all Terms of a Field in the Index together with the number of
> >> occurances
> >> in all documents.
> >>
> >> E.g. have 10 Documents with the Field "author" in the index, 5 of them
> >> having
> >> the value "foo" and 5 "bar" I would like to build a map with:
> >>
> >> [foo] -> 5
> >> [bar] -> 5
> >>
> >> I looked at what Luke is doing to show the top terms of a given field
> in
> >> the
> >> index and it seems to iterate over all terms (using
> >> IndexReader#terms()). Isnt
> >> that quite un-efficient? I would at least expect a method
> >> IndexReader#terms(String field)
> >> to limit the terms on the desired field.
> >>
> >> Thanks for helping,
> >> Ben
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message