lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Burton-West (JIRA)" <>
Subject [jira] Updated: (LUCENE-2393) Utility to output total term frequency and df from a lucene index
Date Thu, 15 Apr 2010 18:12:49 GMT


Tom Burton-West updated LUCENE-2393:

    Attachment: LUCENE-2393.patch

New patch includes a (pre-flex ) version of HighFreqTerms that finds the top N terms with
the highest docFreq and looks up the total term frequency and outputs the list of terms sorted
by highest term frequency (which approximates the largest entries in the *prx files).    I'm
not sure how to combine the GetTermInfo program, with either version of HighFreqTerms  in
a way that leads to sane command line arguments and argument processing.   I suppose that
HighFreqTerms could have a flag that turns on or off the inclusion of total term frequency.

> Utility to output total term frequency and df from a lucene index
> -----------------------------------------------------------------
>                 Key: LUCENE-2393
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Tom Burton-West
>            Priority: Trivial
>         Attachments: LUCENE-2393.patch, LUCENE-2393.patch
> This is a command line utility that takes a field name, term, and index directory and
outputs the document frequency for the term and the total number of occurrences of the term
in the index (i.e. the sum of the tf of the term for each document).  It is useful for estimating
the size of the term's entry in the *prx files and consequent Disk I/O demands

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message