mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: H-Mine
Date Thu, 24 Mar 2011 17:27:52 GMT
Moderately good as a project.

I think that Mahout would benefit more from new capabilities than from a
reimplementation of old capabilities with a new algorithm.  Item set mining
is not particularly subject to radical improvement in terms of the results.
 Speed improvements would be nice, but are not a major point of pain with
the current implementation.

What do you see about this project that would be good for Mahout?  Are there
related things that you need that Mahout doesn't do?

On Thu, Mar 24, 2011 at 10:22 AM, Yarco Hayduk <yarcoh@gmail.com> wrote:

> So does this look like a valid project idea?
>
> Thanks
>
> On Tue, Mar 22, 2011 at 11:12 PM, Yarco Hayduk <yarcoh@gmail.com> wrote:
>
> > I believe that Section 3.1 of the aforementioned paper talks about the
> > parallel version of H-Mine.
> >
> > You are right - H-Mine has a backtracking step, which adds nodes to the
> > next node and introduces a dependency. i.e. you can not start working on
> the
> > i+1 header element before the i-th element is not mined.
> >
> > The H-Mine algorithm works as follows:
> >
> > 1. prunes the intial DB such that all singleton infrequent items are
> > removed
> > 2. divides the pruned db into equal chunks.
> > 3. mines these chunks separately using the H-Mine(mem) algorithm
> > 4. joins the results and
> > 5. scans the pruned db once again to remove false positives and obtain
> the
> > actual counts.
> >
> > I'm pretty sure that I can map these steps to MapReduce.
> >
> > Having said that, I am not sure that this approach would work better than
> > Parallel FP-Growth.
> >
> > On Tue, Mar 22, 2011 at 4:59 PM, Robin Anil <robin.anil@gmail.com>
> wrote:
> >
> >> We have an Parallel FPGrowth implementation. I have read that by itself
> >> HMine is faster in sequential version, but it may not be of use to
> Mahout in
> >> that format. If you are able to propose a parallel implementation of
> H-Mine,
> >> using MapReduce, it will be of great Interest to Mahout.
> >>
> >>
> >> On Wed, Mar 23, 2011 at 3:21 AM, Yarco Hayduk <yarcoh@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> For the *Google Summer of Code 2011 would *you be interested if I
> >>> implemented the H-Mine algorithm?
> >>>
> >>> http://www.cs.sfu.ca/~jpei/publications/Hmine-jn.pdf
> >>>
> >>> Thank you,
> >>> yarco;)
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message