mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <>
Subject Re: Cycles in Eclipse build path
Date Thu, 29 Jul 2010 19:46:55 GMT
I more or less used the basic operations of the
DistributedRowMatrix.timesSquared() operation in terms of the
writing/reading Vectors to the cache; these method calls could basically
replace the code that's currently there.

I'm more than willing to keep this code local to my own packages, or to the
mahout.math.hadoop package or mahout-math project. I'll keep it local until
decided otherwise.

Though here's another random question I just came across: in the
timesSquared() Reducer, the output is repeated pairs of (NullWritable,
VectorWritable) - does this create a single (merged) VectorWritable under
the same key, or a list of VectorWritables?

Thanks again!


On Thu, Jul 29, 2010 at 3:40 PM, Sean Owen <> wrote:

> core has to be Hadoop-free as it does not have Hadoop as a dependency,
> and that is important.
> It sounds like it belongs in utils. But then I wonder why you have
> code in core that also depends on Hadoop (indirectly)?
> math seems to be the home of Hadoop-based math stuff. I think that's
> the home of all your code.
> I might suggest not putting things into utils until it's clear
> something else can use them, and the code has been written to be
> generalizable. I fear utils and other "common" areas turn into a grab
> bag of code that something uses, and that something may use someday,
> but isn't reused yet. That creates problems.
> Sean
> On Thu, Jul 29, 2010 at 10:12 PM, Shannon Quinn <> wrote:
> > Hi all,
> >
> > A technical question regarding some utility methods I'm trying to
> implement:
> > a lot of my M/R tasks require ancillary vectors that I've been saving to
> the
> > cache and which are retrieved in the Mapper and/or Reducer for performing
> > the computations. Since this is done so often, I wrote utility load() and
> > save() methods for accomplishing this. Isabel suggested I make them more
> > generally available, e.g. in some common or utils package in Mahout.
> >
> > 1) Where would be the best place to put this load() and save() -to-cache
> > functionality? I've tried it in the mahout-core o.a.m.common package, and
> > mahout-utils o.a.m.vectors package. I know someone had mentioned that the
> > core should be kept as Hadoop-free as possible, so since this explicitly
> > calls Hadoop functions (DistributedCache, FileSystem, HadoopUtil, etc) I
> > figured the core may not be the best place...
> >
> > 2) ...but as the eclipse projects are currently set up, mahout-utils
> depends
> > on mahout-core and mahout-examples, and mahout-core depends only on
> > mahout-math. As such, with my utility functions in mahout-utils, and the
> > code in need of them in mahout-core, I can't specify a correct import
> unless
> > I modify the properties of mahout-core to "see" mahout-utils. While this
> > does indeed fix the import problem, it introduces a cyclical dependency
> > which is obviously not ideal, either.
> >
> > Thank you in advance for your help!
> >
> > Regards,
> > Shannon
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message