mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: ItemSimilarityJob creates no output
Date Thu, 07 Jun 2012 06:05:03 GMT
I tried with a bigger/denser dataset, but still no output.  Here's what I
noticed:

In the MergeVectorsReducer, I see the following:

    @Override
    protected void reduce(IntWritable row, Iterable<VectorWritable>
partialVectors, Context ctx)
        throws IOException, InterruptedException {
      Vector partialVector = Vectors.merge(partialVectors);

      if (row.get() == NORM_VECTOR_MARKER) {
        Vectors.write(partialVector, normsPath, ctx.getConfiguration());
      } else if (row.get() == MAXVALUE_VECTOR_MARKER) {
        Vectors.write(partialVector, maxValuesPath, ctx.getConfiguration());
      } else if (row.get() == NUM_NON_ZERO_ENTRIES_VECTOR_MARKER) {
        Vectors.write(partialVector, numNonZeroEntriesPath,
ctx.getConfiguration(), true);
      } else {
        ctx.write(row, new VectorWritable(partialVector));
      }
    }


There's nothing coming out of this method.  Where is the output supposed to
go?  In other words, what Path is this:

normsPath = new Path(ctx.getConfiguration().get(NORMS_PATH));


There are 150 rows going into this reducer & nothing is coming out.  Where
is it supposed to go under /tmp?  I see the following under HDFS:

-rw-r--r--   3 root supergroup          7 2012-06-06 21:57
/user/XXX/tmp/maxValues.bin
-rw-r--r--   3 root supergroup          7 2012-06-06 21:57
/user/XXX/tmp/norms.bin
-rw-r--r--   3 root supergroup          7 2012-06-06 21:57
/user/XXX/tmp/numNonZeroEntries.bin
drwxrwxrwx   - root supergroup          0 2012-06-06 21:57
/user/XXX/tmp/pairwiseSimilarity
drwxrwxrwx   - root supergroup          0 2012-06-06 21:55
/user/XXX/tmp/prepareRatingMatrix
drwxrwxrwx   - root supergroup          0 2012-06-06 21:58
/user/XXX/tmp/similarityMatrix
drwxrwxrwx   - root supergroup          0 2012-06-06 21:57
/user/XXX/tmp/weights





On Wed, Jun 6, 2012 at 10:20 AM, Sean Owen <srowen@gmail.com> wrote:

> Just make, say, a completely dense fake data set over 1000 users and items.
> Something will come out.
> On Jun 6, 2012 6:11 PM, "Something Something" <mailinglists19@gmail.com>
> wrote:
>
> > Hmm... that's what I am thinking.. something is a miss!  A few lines from
> > the files are pasted above.  The pattern is fairly similar.  Is there a
> > place where I can upload part of my file for someone else to try?
> >
> > OR BETTER YET - Can someone provide a small file that always returns a
> few
> > similarities?  Does a file such as this included in the source?
> >
> > Thanks for the help.
> >
> > On Wed, Jun 6, 2012 at 9:01 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> > > That sounds like plenty of data -- doubting that's any issue. Is it
> > > very sparse? Meaning many items exist just for one user? It's really
> > > sparseness that might produce few or no similarities.
> > >
> > > I think something else is at work here but don't know off the top of
> > > my head based on the info so far.
> > >
> > > Yes it is always the same hash function -- top 8 bytes of the MD5
> > > hash. Same input means same output.
> > >
> > > Sean
> > >
> > > On Wed, Jun 6, 2012 at 4:57 PM, Something Something
> > > <mailinglists19@gmail.com> wrote:
> > > > The input size was about 6 Million so I was expecting to find some
> > > > similarities.  Anyway, I have started a test with the real dataset
> that
> > > > contains 700 million lines.  We shall see how that goes.  One quick
> > > > question, though:
> > > >
> > > > I am using MemoryIDMigrator to convert UserIds from String to Long as
> > > > follows:
> > > >
> > > >    static UpdatableIDMigrator migrator = new MemoryIDMigrator();
> > > > <some code omitted here...>
> > > >    migrator.toLongID(strUserID);
> > > >
> > > > Question:  If I pass the same userId multiple times to this method, I
> > am
> > > > guaranteed to get the same 'Long' number back, correct?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message