lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Weight for all Terms in all Documents
Date Mon, 11 Oct 2010 16:00:09 GMT
Have a look at the TermVectors.  If you are Solr user, the TermVectorComponent.  In either
case, you will have to reassemble some things to get the weights Lucene actually uses for
scoring.  You can, however, get a simple TF-IDF weight without too much work.  

On Oct 5, 2010, at 2:01 PM, William Koscho wrote:

> How do I get the weights for all terms in all documents?
> For a given set of documents, what are the series of API calls I need to
> make to get the following type of information:
> doc1, termA_weight, termB_weight, etc..
> doc2, termC_weight, termD_weight, etc..
> doc3, termE_weight, termZ_weight, etc..
> It seems that I have to start with a Query object, that is typically
> provided by an end-user.  However, in my case, I don't have an end user or a
> specific query.  Instead I am trying to analyze the documents and interested
> in getting the weights of all terms so that I can compute some statistics
> about the similarity among documents.
> Thanks in advance,
> Bill

Grant Ingersoll

View raw message