mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject ItemSimilarityJob creates no output
Date Tue, 05 Jun 2012 02:36:48 GMT
My job setup is really simple.  It looks like this:

    public int run(String[] args) throws Exception {
        String datasetDate = args[0];
        String inputPath = args[1];
        String configFile = args[2];
        String ouputLocation = args[3];

        Configuration config = getConf();
        config.addResource(new Path(configFile));
        logger.error("config: " + config.toString());

        File inputFile = new File(inputPath);
        File outputDir = new File(ouputLocation);
        outputDir.delete();
        File tmpDir = new File("/tmp");

        ItemSimilarityJob similarityJob = new ItemSimilarityJob();

        Configuration conf = new Configuration();
        conf.set("mapred.input.dir", inputFile.getAbsolutePath());
        conf.set("mapred.output.dir", outputDir.getAbsolutePath());
        conf.setBoolean("mapred.output.compress", false);

        similarityJob.setConf(conf);

        similarityJob.run(new String[]{"--tempDir",
tmpDir.getAbsolutePath(), "--similarityClassname",
                PearsonCorrelationSimilarity.class.getName(),});

        return 0;
    }


The input file is sorted by UserId, ItemId & Preference.  Preference is
always '1'.  A few lines from the file look like this:

-1000000334008648908    1    1
-1000000334008648908    70    1
-1000000334008648908    2090    1
-1000000334008648908    12872    1
-1000000334008648908    32790    1
-1000000334008648908    32799    1
-1000000334008648908    32969    1
-1000000397028994738    1    1
-1000000397028994738    12872    1
-1000000397028994738    32790    1
-1000000397028994738    32796    1
-1000000397028994738    32939    1
-100000083781885705    1    1
-100000083781885705    12872    1
-100000083781885705    32790    1
-100000083781885705    32837    1
-100000083781885705    33723    1
-1000001014586220418    1    1
-1000001014586220418    12872    1
-1000001014586220418    32790    1
& so on...

(UserId is created using MemoryIDMigrator)


The job internally runs following 7 Hadoop jobs which all run successfully:

PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer
PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer
PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer
RowSimilarityJob-VectorNormMapper-Reducer
RowSimilarityJob-CooccurrencesMapper-Reducer
RowSimilarityJob-UnsymmetrifyMapper-Reducer
ItemSimilarityJob-MostSimilarItemPairsMapper-Reducer


Problem is that the output file is empty!  What am I missing?  Please
help.  Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message