lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Burton-West (JIRA)" <>
Subject [jira] Updated: (LUCENE-2393) Utility to output total term frequency and df from a lucene index
Date Tue, 20 Apr 2010 18:21:49 GMT


Tom Burton-West updated LUCENE-2393:

    Attachment: LUCENE-2393.patch

Revised patch updated everything  to flex.  Replaces all references to Term with BytesRef
and field.  
GetTermInfo now requires a field instead of default= ocr
removed unused String[] fields argument
GetTermInfo now uses shared code HighFreqTermsWithTF.getTotalTF(); to get total tf.
Removed GetTermInfo dependency on TermInfoWithTotalTF[] and inlined it into HighFreqTermsWithTF.

Still don't understand the bulk read API, but given that I have indexes with *frq files of
60GB I'd like to use it.  Is there some documentation, code, or a test case I might look at

> Utility to output total term frequency and df from a lucene index
> -----------------------------------------------------------------
>                 Key: LUCENE-2393
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Tom Burton-West
>            Priority: Trivial
>         Attachments: LUCENE-2393.patch, LUCENE-2393.patch, LUCENE-2393.patch, LUCENE-2393.patch
> This is a command line utility that takes a field name, term, and index directory and
outputs the document frequency for the term and the total number of occurrences of the term
in the index (i.e. the sum of the tf of the term for each document).  It is useful for estimating
the size of the term's entry in the *prx files and consequent Disk I/O demands

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message