lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Pasero <bpas...@rssowl.org>
Subject Re: Retrieving TermVectors from a Field over the full index?
Date Mon, 11 Jun 2007 07:19:55 GMT


This is what I coded up till now:

TermEnum terms = reader.terms();
while (terms.next()) {
  String field = terms.term().field();
  if ("keywords".equals(field))
    keywords.put(terms.term().text(), terms.docFreq());
}

Your solution is doing more or less the same right?

Ben
> Maybe I'm missing the boat, but I don't understand why TermEnum doesn't
> work for you.
> Try something like...
>
>       TermEnum te = this.reader.terms();
>        te.skipTo(new Term("keyword", ""));
>        while (te.next()) {
>            Term term = te.term();
>            if (! term.field().equals("keyword")) {
>                break;
>            }
>            System.out.println(term.text());
>        }
>
>
>
> On 6/10/07, Benjamin Pasero <bpasero@rssowl.org> wrote:
>>
>>
>>
>> Erick Erickson wrote:
>> > Um, to return all counts of all terms in a field, what other option
>> > *is* there except to walk the whole thing?
>> >
>> > Have you looked at TermEnum, TermDocs, and TermFreqVector?
>> > For that matter, TermPositionVector might also be of some use.
>> >
>> > It would be easier to provide some help if you
>> > 1> mentioned what you'd tried already
>> > 2> mentioned what's inadequate about what you've tried.
>> Sorry for not being clear what I am trying to achieve. I am storing
>> documents in my index that are made of 5 Fields. One of the Fields
>> contains keywords that describe the document. Now, I need a fast
>> way of retrieving these keywords together with their frequency from
>> the index.
>>
>> My current solution is to use IndexReader#terms() to walk over all
>> terms and count the ones that appear in the keyword-Field.
>>
>> As you can assume, this is not scaling well. The content in the keywords
>> field is usually quite small, however, the other fields may store
>> up to thousands of terms.
>>
>> What I am asking for is a way to walk all the terms of just the
>> keyword-field
>> in order to avoid having to walk all terms in all fields.
>>
>> Of course, even better would be some API that would return a TermVector
>> from
>> the keyword-field. But I guess TermVectors are only supported on a per
>> Document level and not index level?
>>
>> Regards,
>> Ben
>> >
>> > Best
>> > Erick
>> >
>> > On 6/9/07, Benjamin Pasero <bpasero@rssowl.org> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I wonder if this is possible:
>> >>
>> >> Return all Terms of a Field in the Index together with the number of
>> >> occurances
>> >> in all documents.
>> >>
>> >> E.g. have 10 Documents with the Field "author" in the index, 5 of 
>> them
>> >> having
>> >> the value "foo" and 5 "bar" I would like to build a map with:
>> >>
>> >> [foo] -> 5
>> >> [bar] -> 5
>> >>
>> >> I looked at what Luke is doing to show the top terms of a given field
>> in
>> >> the
>> >> index and it seems to iterate over all terms (using
>> >> IndexReader#terms()). Isnt
>> >> that quite un-efficient? I would at least expect a method
>> >> IndexReader#terms(String field)
>> >> to limit the terms on the desired field.
>> >>
>> >> Thanks for helping,
>> >> Ben
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message