mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaurav singh <gauravonlin...@gmail.com>
Subject Re: mahout pfp : isSubPatternof() function
Date Mon, 27 Feb 2012 07:45:50 GMT
Thanks for the help Tom :-)

On Sun, Feb 26, 2012 at 11:56 PM, tom <tcp@apache.org> wrote:

> It's not well documented, but there are actually two distinct
> implementations of FPGrowth, which each can be run sequentially or as
> mapreduce jobs.
>
> The --method option lets you select sequential/mapreduce, and the
> --useFPG2/-2 flag selects the alternate implementation.
>
> Any way you run FPG, patterns will be collected in
> FrequentPatternMaxHeaps; all implementation/mode combinations will make use
> of this class.
>
> I do not recall the precise details right now, but something about the
> mining/aggregation strategy used in the original (default) implementation
> leads to redundant patterns appearing when running in mapreduce mode.  If
> your question is driven by finding unexpected redundancies in FPG output,
> I'd be interested to hear if this persists after trying --useFPG2.
>
> -tom
>
>
>
> On 02/26/2012 12:06 PM, gaurav singh wrote:
>
>> Hi Tom,
>>
>> I don't understand, why do you say I will get a lot of redundant patterns?
>> In each group dependent shard generates patterns with respect to the
>> elements of that shard. The fpg-2 as far as I know and if I am correct is
>> only a new sequential implementation of fp-growth and not map/reduce
>> implementation.
>>
>> My question was specifically if we eliminate subpatterns from output in
>> mahout parallel fp-growth(map/reduce version)? I know that the function
>> exists in FrequentPatternMaxHeap, but that's the sequential algorithm, I
>> am
>> asking only about the map/reduce version?
>>
>> On Sun, Feb 26, 2012 at 9:39 PM, tom<tcp@apache.org>  wrote:
>>
>>  Hi Gaurav,
>>>
>>> The patterns are accumulated in a heap (see FrequentPatternMaxHeap),
>>> which
>>> uses isSubPatternOf.
>>>
>>> That said, I do think the default implementation of PFPGrowth will get
>>> you
>>> many redundant patterns under certain circumstances, but the "-2"
>>> implementation will reduce (perhaps eliminate?) redundant patterns.
>>>
>>> -tom
>>>
>>>
>>> On 02/26/2012 09:39 AM, gaurav singh wrote:
>>>
>>>  Hi Guys,
>>>>
>>>>
>>>> There is a function in mahout sequential fp-growth algorithm named
>>>> isSubPatternof() which returns whether one pattern is subpattern of
>>>> another
>>>> pattern and if both have equal support only the one larger of the two is
>>>> output. I can't find any such function being used in parallel fp-growth.
>>>> Does that mean that in parallel fp-growth we display all the possible
>>>> patterns without eliminating such subpatterns?
>>>>
>>>> Thanks for help!
>>>>
>>>>
>>>>
>>
>


-- 
regards
Gaurav Singh

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message