mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Schlaikjer <andrew.schlaik...@gmail.com>
Subject Re: Order of documents in LDA results
Date Mon, 02 Jul 2012 15:36:14 GMT
Ivan,

Mahout LDA input:

1) a set of (document id, term vector) pairs in SequenceFile<IntWritable,
VectorWritable> format.
2) optionally, a dictionary of (term, term index) pairs in
SequenceFile<IntWritable, Text> format.

Output:

1) a "model"; set of (topic index, term vector) pairs
in SequenceFile<IntWritable, VectorWritable> format. Topic identifiers are
zero-based indices.
2) optionally, a set of (document id, topic vector) pairs
in SequenceFile<IntWritable, VectorWritable> format. This is inference
output of the trained model on input #1 above. Note that the topic vectors
have cardinality equal to the number of latent topics you trained with
(e.g. 50, 100) and are dense. An entry k in document d's topic vector
represets the model's estimate of p(topic = k | doc = d).

Andy
@sagemintblue


On Mon, Jul 2, 2012 at 5:54 AM, ivan obeso <sendero.luminoso@gmail.com>wrote:

> Hi,
>
> I would like to know wich is the order of the documents in the LDA running
> results. For example, I know that the topic/word file is a group of
> IntWritable keys with VectorWritable values, and the key corresponds with
> the topic id and the intWritable have in position 0 the word in position 0
> in the dictionary file....
>
> but in the document/topic file I am not sure about the order followed. The
> key is an IntWritable that represents the document ID, but i dont know
> where to read the filename/docID table.
>
> Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message