mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Parallel FPGrowth driver - what is a good demo?
Date Mon, 01 Aug 2011 03:29:05 GMT
I've rewritten the FPGrowth wiki page. Is still a bit ragged. Please
critique for content.

https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining

On Thu, Jul 28, 2011 at 12:59 AM, Lance Norskog <goksron@gmail.com> wrote:
> Ok, now I've succeeded in running fpgrowth, both sequential and
> mapreduce, from the 'fpg' job and the flag that chooses 'sequential'
> from 'mapreduce'. I've done this from two different datasets,
> accidents.dat and retail.dat. I only ran the first thousand lines of
> both datasets for time reasons.
>
> Both sequential and mapreduce locate the same ids as being in
> patterns. Examining the patterns in detail, they do not match but
> patterns involving id X generally the same size. Successive runs of
> each variant give exactly the same results, so having sequential and
> mapreduce give different result sets is puzzling. Pulling the
> distances is a little difficult with text processing.
>
> What can account for the different outputs of map/reduce and
> sequential (pseudo-distributed) modes?
>
>
>
>
> On 7/27/11, Lance Norskog <goksron@gmail.com> wrote:
>> I'll prep a current version.
>>
>> On 7/27/11, Robin Anil <robin.anil@gmail.com> wrote:
>>> On Tue, Jul 26, 2011 at 11:06 PM, Lance Norskog <goksron@gmail.com>
>>> wrote:
>>>
>>>> The parameters and files mentioned on this page do not find any
>>>> frequent patterns:
>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining
>>>
>>> Let me run and correct this doc.
>>>
>>>>
>>>>
>>>> Have 'accidents.dat.gz' from the given site, or 'retail.dat.gz' from
>>>> the same site, what parameters should find some frequent patterns?
>>>
>>>
>>>> Also, what is the magic to get maven to pass JDK options to an exec'd
>>>> class?
>>>
>>> Did you try using the bin/mahout script. the memory size is configurable
>>> inside it.
>>>
>>>
>>>> FPGrowth sequential needs the memory size bumped up.
>>>
>>>
>>>> Cheers,
>>>>
>>>> --
>>>> Lance Norskog
>>>> goksron@gmail.com
>>>>
>>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message