mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <jeast...@Narus.com>
Subject RE: how to transfer the sequence file into readable format
Date Thu, 07 Jul 2011 20:24:10 GMT
I think you want LDAPrintTopics?

-----Original Message-----
From: dhruv21@gmail.com [mailto:dhruv21@gmail.com] On Behalf Of Dhruv Kumar
Sent: Thursday, July 07, 2011 11:29 AM
To: user@mahout.apache.org
Subject: Re: how to transfer the sequence file into readable format

Sequence Files store key and value pairs in a binary, compressed format. To
read a sequence file and display the key and values in a human format, you
can use SequenceFile Reader:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html

I don't know the outputs of LDA, but in general you can do the following,
assuming key is IntWritable and value is DoubleWritable.

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.reader reader = new SequenceFile.reader(fs, new
Path("/path/to/output/of/LDA"), conf);
IntWritable key = new IntWritable();
DoubleWritable value = new DoubleWritable();

while(reader.next(key, value)) {
  System.out.println(key.toString(), value.toString());
}
reader.close();


There may be a convenient command line utility for LDA also which someone
else can point out. However, you can always write your own simple class as
shown above for reading any Sequence File.





On Thu, Jul 7, 2011 at 1:53 PM, wine lover <winecoding@gmail.com> wrote:

> Dear All,
>
> After running LDA analysis, I got the docTopic file, which is a regular
> sequence-file. How to transfer it into a readable format? I searched
> vectordumper, or vectordump, but did not get any useful results, such as
> how
> to use it in command-line? Thanks.
>

Mime
View raw message