mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Don't Get the SequenceFile.Reader Path in SimpleKMeansCluster
Date Wed, 19 Oct 2011 07:27:44 GMT
You mean the Mahout javadoc?

Are you asking why this location was chosen? or what this path contains?
All Hadoop output comes in the form of many files in a directory. They
are named like "part-m-00000" and so on. That's where that comes from.
The rest is just where the job has chosen to put the output by default.

On Wed, Oct 19, 2011 at 1:49 AM, robpd <> wrote:
> Hi
> I'm a Hadoop / Mahout learner - slowly getting there but still at the
> questionning stage!
> I was looking at the example (in the Mahout in
> Action book) and was wondering about the code line....
> SequenceFile.Reader reader = new SequenceFile.Reader(fs,
> new Path("output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"), conf);
> The Hadoop java docs don't give a description of the input parameters to
> this method so it's not very clear exactly what the path refers to. I guess
> that the method reads from the Hadoop FS at the location...
> "output/" + Cluster.CLUSTERED_POINTS_DIR + "/part-m-00000"
> to get the output clusters to report. Correct?
> How do you know that the cluster output would be at this path-location in
> the file system though? None of the preceding code lines give a clue as to
> this.  There's nothing in the example that makes it clear as to why the
> outputs are placed in this location as opposed to anywhere else
> (particularly the "/part-m-00000" is confusing). I'm hoping that Ive missed
> something obvious here.  Having to know exactly where to look for the output
> in the general case would make things very difficult to use without delving
> into the Mahout / Hadoop source code itself.
> Sorry to be dim! Any help would be appreciated.
> --
> View this message in context:
> Sent from the Mahout User List mailing list archive at

View raw message