lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Weight for all Terms in all Documents
Date Tue, 05 Oct 2010 19:49:51 GMT
There is a utility in the Apache Mahout project that dumps documents as
weight vectors.

On Tue, Oct 5, 2010 at 11:01 AM, William Koscho <> wrote:

> How do I get the weights for all terms in all documents?
> For a given set of documents, what are the series of API calls I need to
> make to get the following type of information:
> doc1, termA_weight, termB_weight, etc..
> doc2, termC_weight, termD_weight, etc..
> doc3, termE_weight, termZ_weight, etc..
> It seems that I have to start with a Query object, that is typically
> provided by an end-user.  However, in my case, I don't have an end user or
> a
> specific query.  Instead I am trying to analyze the documents and
> interested
> in getting the weights of all terms so that I can compute some statistics
> about the similarity among documents.
> Thanks in advance,
> Bill

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message