lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Weight for all Terms in all Documents
Date Tue, 05 Oct 2010 19:49:51 GMT
There is a utility in the Apache Mahout project that dumps documents as
weight vectors.

On Tue, Oct 5, 2010 at 11:01 AM, William Koscho <wkoscho@gmail.com> wrote:

> How do I get the weights for all terms in all documents?
>
> For a given set of documents, what are the series of API calls I need to
> make to get the following type of information:
>
> doc1, termA_weight, termB_weight, etc..
> doc2, termC_weight, termD_weight, etc..
> doc3, termE_weight, termZ_weight, etc..
>
> It seems that I have to start with a Query object, that is typically
> provided by an end-user.  However, in my case, I don't have an end user or
> a
> specific query.  Instead I am trying to analyze the documents and
> interested
> in getting the weights of all terms so that I can compute some statistics
> about the similarity among documents.
>
> Thanks in advance,
> Bill
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message