mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yarco Hayduk <yar...@gmail.com>
Subject Re: H-Mine
Date Thu, 24 Mar 2011 17:22:57 GMT
So does this look like a valid project idea?

Thanks

On Tue, Mar 22, 2011 at 11:12 PM, Yarco Hayduk <yarcoh@gmail.com> wrote:

> I believe that Section 3.1 of the aforementioned paper talks about the
> parallel version of H-Mine.
>
> You are right - H-Mine has a backtracking step, which adds nodes to the
> next node and introduces a dependency. i.e. you can not start working on the
> i+1 header element before the i-th element is not mined.
>
> The H-Mine algorithm works as follows:
>
> 1. prunes the intial DB such that all singleton infrequent items are
> removed
> 2. divides the pruned db into equal chunks.
> 3. mines these chunks separately using the H-Mine(mem) algorithm
> 4. joins the results and
> 5. scans the pruned db once again to remove false positives and obtain the
> actual counts.
>
> I'm pretty sure that I can map these steps to MapReduce.
>
> Having said that, I am not sure that this approach would work better than
> Parallel FP-Growth.
>
> On Tue, Mar 22, 2011 at 4:59 PM, Robin Anil <robin.anil@gmail.com> wrote:
>
>> We have an Parallel FPGrowth implementation. I have read that by itself
>> HMine is faster in sequential version, but it may not be of use to Mahout in
>> that format. If you are able to propose a parallel implementation of H-Mine,
>> using MapReduce, it will be of great Interest to Mahout.
>>
>>
>> On Wed, Mar 23, 2011 at 3:21 AM, Yarco Hayduk <yarcoh@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> For the *Google Summer of Code 2011 would *you be interested if I
>>> implemented the H-Mine algorithm?
>>>
>>> http://www.cs.sfu.ca/~jpei/publications/Hmine-jn.pdf
>>>
>>> Thank you,
>>> yarco;)
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message