mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: ClusteredPoints
Date Fri, 25 Nov 2011 07:05:57 GMT
Run this code after the kmeans clustering is done.

I have arranged code so that you can simply use the process method by 
supplying it the path of clusteredPoints directory inside the output 
path for clustering, the hadoop fileSystem and Configuration.

   //use clusterId and vector here to write to a local file.

At this line you will get the clusterId and vector. Use it to write to 
the file.


public void process(Path clusteredPoints, FileSystem fileSystem, 
Configuration conf){
  FileStatus[] partFiles = getAllClusteredPointPartFiles();
     for (FileStatus partFile : partFiles) {
       SequenceFile.Reader clusteredPointsReader = new 
SequenceFile.Reader(fileSystem, partFile.getPath(),
           conf);
       WritableComparable clusterIdAsKey = (WritableComparable) 
clusteredPointsReader.getKeyClass()
           .newInstance();
       Writable vector = (Writable) 
clusteredPointsReader.getValueClass().newInstance();
       while (clusteredPointsReader.next(clusterIdAsKey, vector)) {
         //use clusterId and vector here to write to a local file.

       }
       clusteredPointsReader.close();
     }
   }
}

  private FileStatus[] getAllClusteredPointPartFiles(Path 
clusteredPoints, FileSystem fileSystem) throws IOException {
     Path[] partFilePaths = 
FileUtil.stat2Paths(fileSystem.globStatus(clusteredPoints,
       PathFilters.partFilter()));
     FileStatus[] partFileStatuses = 
fileSystem.listStatus(partFilePaths, PathFilters.partFilter());
     return partFileStatuses;
   }

Paritosh


On 25-11-2011 12:27, Rachana wrote:
> Hi Ranjan,
>
> Thank you for your response, but as I am newbee I am kind of confused a bit!
> Where should I include this code?
> Or should I run this as a seperate program.
>
>
> Rachana.
>
>
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/4037 - Release Date: 11/24/11


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message