mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Commented: (MAHOUT-178) Rationalize 'utils' and 'common' stuff
Date Thu, 17 Sep 2009 13:38:57 GMT


Grant Ingersoll commented on MAHOUT-178:

The mahout-utils module to me is where we can put tools that help get things ready for Mahout.
 It can bring in libraries like Lucene, Tika, etc. to prepare raw content for use by Mahout
and also to provide utilities that might be helpful in dealing with output.  I don't think
it is core because not everyone will need it and it helps keep the core more focused on providing
algorithm implementations.

> Rationalize 'utils' and 'common' stuff
> --------------------------------------
>                 Key: MAHOUT-178
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.1
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>         Attachments: MAHOUT-178.patch
> Every project needs a common area for code that is not obviously part of any specific
piece of the project, typically because it's used in many places. This is good as it promotes
reuse. I would like to make an explicit effort to rationalize this project's approach to 'common',
starting with some basic reshuffling, which will then pave the way to unify more of the code
that is duplicated now (thinking: caches, distance measures, Hadoop integration, etc.)
> Right now we have this common code in three places, when it seems like there should be
basically one:
> - mahout-core: org.apache.mahout.utils
> - mahout-core: org.apache.mahout.common
> - mahout-utils
> I suggest that of the two packages named above, 'common' is slightly preferable; one
could easily just merge these packages. I also would like to ask whether it makes sense to
have a mahout-utils module? It's like having a mahout-core-core, in my opinion. It appears
to serve exactly the same role as the other utils/common package. Would it ever be used as
a standalone build product?
> Renaming may sound like a trivial change, but I think the above is merely symptomatic
of several developers having independent ideas about where to stash common stuff. I want to
force the issue and push everyone's stuff together to begin the hard but necessary work of
refactoring the code base into something more unified.
> So far, I propose pushing all code together into org.apache.mahout.common. This is enough
of a big-bang that will break patches that I want to propose it, and if agreed, plan when
to commit.
> (Also, shouldn't stuff like the distance measure classes be in a package?)
> Anyway, partial patch will be attached shortly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message