mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: PFP Growth : ParallelFPGrowth reduce taking a lo--ong time
Date Mon, 04 Apr 2011 20:35:07 GMT
Could you try with the Performance patch on the JIRA issue page 619


On Tue, Apr 5, 2011 at 1:14 AM, Vipul Pandey <vipandey@gmail.com> wrote:

> So I have a new problem now.
> I have about 3.5M baskets and about a 100K items/features. After
> successfully completing parallel-counting, grouping and transaction sorting
> in a reasonable amount of time - PFP Growth just gets stuck in the Reduce
> phase of ParallelFPGrowth.
> The reducer is running since about 6:30 PM PST Friday, 1st of April - (it's
> beyond noon of 4th April here - i.e. running for more than 65 hours now)
> and
> it's still going on.
>
> Anyone faced this issue before? (Console logs pasted below)
> Also,
> - We don't have anything else running on our cluster and it's a reasonably
> powerful box.
> - the jobtracker UI shows this status for the reducer : "Processing FPTree:
> FPGrowth Algorithm for a given feature: 49797 > reduce" and the feature
> keeps changing every minute or two- which shows that it's still doing it's
> job!
> - no other errors/messages in the logs
> - config :  -s 100 -k 50 -g 2  - Any Recommendations on these? my usual
> values are "-s 10 -g 10" but that doesn't seem to cut for me either.
>
>
>
> Any clue?
>
> Thanks
> Vipul
>
>
> Log :
>
> 11/04/01 18:20:07 INFO common.HadoopUtil: Deleting
>  /run/MR20/fim/parallelcounting
>
> 11/04/01 18:20:07 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
>
> 11/04/01 18:20:08 INFO input.FileInputFormat: Total input paths to process
> :
> 20
>
> 11/04/01 18:20:08 INFO mapred.JobClient: Running job: job_201103311244_0045
>
> 11/04/01 18:20:09 INFO mapred.JobClient:  map 0% reduce 0%
>
> 11/04/01 18:20:20 INFO mapred.JobClient:  map 2% reduce 0%
>
> 11/04/01 18:20:21 INFO mapred.JobClient:  map 8% reduce 0%
>
> 11/04/01 18:20:22 INFO mapred.JobClient:  map 11% reduce 0%
>
> 11/04/01 18:20:23 INFO mapred.JobClient:  map 13% reduce 0%
>
> 11/04/01 18:20:24 INFO mapred.JobClient:  map 16% reduce 0%
>
> 11/04/01 18:20:25 INFO mapred.JobClient:  map 17% reduce 0%
>
> 11/04/01 18:20:26 INFO mapred.JobClient:  map 18% reduce 0%
>
> 11/04/01 18:20:27 INFO mapred.JobClient:  map 21% reduce 0%
>
> 11/04/01 18:20:28 INFO mapred.JobClient:  map 22% reduce 0%
>
> 11/04/01 18:20:29 INFO mapred.JobClient:  map 24% reduce 0%
>
> 11/04/01 18:20:30 INFO mapred.JobClient:  map 27% reduce 0%
>
> 11/04/01 18:20:31 INFO mapred.JobClient:  map 28% reduce 0%
>
> 11/04/01 18:20:32 INFO mapred.JobClient:  map 31% reduce 0%
>
> 11/04/01 18:20:34 INFO mapred.JobClient:  map 33% reduce 0%
>
> 11/04/01 18:20:35 INFO mapred.JobClient:  map 35% reduce 0%
>
> 11/04/01 18:20:36 INFO mapred.JobClient:  map 38% reduce 0%
>
> 11/04/01 18:20:37 INFO mapred.JobClient:  map 40% reduce 0%
>
> 11/04/01 18:20:38 INFO mapred.JobClient:  map 42% reduce 0%
>
> 11/04/01 18:20:39 INFO mapred.JobClient:  map 43% reduce 0%
>
> 11/04/01 18:20:40 INFO mapred.JobClient:  map 45% reduce 0%
>
> 11/04/01 18:20:41 INFO mapred.JobClient:  map 47% reduce 0%
>
> 11/04/01 18:20:42 INFO mapred.JobClient:  map 50% reduce 0%
>
> 11/04/01 18:20:44 INFO mapred.JobClient:  map 53% reduce 0%
>
> 11/04/01 18:20:45 INFO mapred.JobClient:  map 55% reduce 0%
>
> 11/04/01 18:20:47 INFO mapred.JobClient:  map 57% reduce 0%
>
> 11/04/01 18:20:48 INFO mapred.JobClient:  map 60% reduce 0%
>
> 11/04/01 18:20:49 INFO mapred.JobClient:  map 61% reduce 0%
>
> 11/04/01 18:20:50 INFO mapred.JobClient:  map 63% reduce 0%
>
> 11/04/01 18:20:51 INFO mapred.JobClient:  map 64% reduce 0%
>
> 11/04/01 18:20:52 INFO mapred.JobClient:  map 66% reduce 0%
>
> 11/04/01 18:20:53 INFO mapred.JobClient:  map 68% reduce 0%
>
> 11/04/01 18:20:54 INFO mapred.JobClient:  map 69% reduce 0%
>
> 11/04/01 18:20:55 INFO mapred.JobClient:  map 71% reduce 0%
>
> 11/04/01 18:20:56 INFO mapred.JobClient:  map 72% reduce 0%
>
> 11/04/01 18:20:57 INFO mapred.JobClient:  map 74% reduce 0%
>
> 11/04/01 18:20:58 INFO mapred.JobClient:  map 75% reduce 0%
>
> 11/04/01 18:20:59 INFO mapred.JobClient:  map 76% reduce 0%
>
> 11/04/01 18:21:00 INFO mapred.JobClient:  map 78% reduce 0%
>
> 11/04/01 18:21:01 INFO mapred.JobClient:  map 79% reduce 0%
>
> 11/04/01 18:21:03 INFO mapred.JobClient:  map 81% reduce 0%
>
> 11/04/01 18:21:06 INFO mapred.JobClient:  map 83% reduce 0%
>
> 11/04/01 18:21:07 INFO mapred.JobClient:  map 85% reduce 0%
>
> 11/04/01 18:21:09 INFO mapred.JobClient:  map 87% reduce 0%
>
> 11/04/01 18:21:10 INFO mapred.JobClient:  map 88% reduce 0%
>
> 11/04/01 18:21:12 INFO mapred.JobClient:  map 89% reduce 0%
>
> 11/04/01 18:21:13 INFO mapred.JobClient:  map 90% reduce 0%
>
> 11/04/01 18:21:16 INFO mapred.JobClient:  map 91% reduce 0%
>
> 11/04/01 18:21:17 INFO mapred.JobClient:  map 92% reduce 0%
>
> 11/04/01 18:21:19 INFO mapred.JobClient:  map 93% reduce 0%
>
> 11/04/01 18:21:25 INFO mapred.JobClient:  map 94% reduce 0%
>
> 11/04/01 18:21:29 INFO mapred.JobClient:  map 94% reduce 5%
>
> 11/04/01 18:21:35 INFO mapred.JobClient:  map 95% reduce 6%
>
> 11/04/01 18:21:40 INFO mapred.JobClient:  map 96% reduce 6%
>
> 11/04/01 18:21:43 INFO mapred.JobClient:  map 97% reduce 6%
>
> 11/04/01 18:21:44 INFO mapred.JobClient:  map 97% reduce 10%
>
> 11/04/01 18:21:48 INFO mapred.JobClient:  map 98% reduce 10%
>
> 11/04/01 18:21:52 INFO mapred.JobClient:  map 99% reduce 10%
>
> 11/04/01 18:21:57 INFO mapred.JobClient:  map 100% reduce 10%
>
> 11/04/01 18:21:59 INFO mapred.JobClient:  map 100% reduce 13%
>
> 11/04/01 18:22:05 INFO mapred.JobClient:  map 100% reduce 18%
>
> 11/04/01 18:22:11 INFO mapred.JobClient:  map 100% reduce 25%
>
> 11/04/01 18:23:23 INFO mapred.JobClient:  map 100% reduce 26%
>
> 11/04/01 18:23:26 INFO mapred.JobClient:  map 100% reduce 33%
>
> 11/04/01 18:25:27 INFO mapred.JobClient:  map 100% reduce 74%
>
> 11/04/01 18:25:29 INFO mapred.JobClient:  map 100% reduce 76%
>
> 11/04/01 18:25:32 INFO mapred.JobClient:  map 100% reduce 78%
>
> 11/04/01 18:25:35 INFO mapred.JobClient:  map 100% reduce 80%
>
> 11/04/01 18:25:38 INFO mapred.JobClient:  map 100% reduce 82%
>
> 11/04/01 18:25:41 INFO mapred.JobClient:  map 100% reduce 84%
>
> 11/04/01 18:25:44 INFO mapred.JobClient:  map 100% reduce 86%
>
> 11/04/01 18:25:47 INFO mapred.JobClient:  map 100% reduce 88%
>
> 11/04/01 18:25:50 INFO mapred.JobClient:  map 100% reduce 90%
>
> 11/04/01 18:25:53 INFO mapred.JobClient:  map 100% reduce 92%
>
> 11/04/01 18:25:56 INFO mapred.JobClient:  map 100% reduce 94%
>
> 11/04/01 18:25:59 INFO mapred.JobClient:  map 100% reduce 96%
>
> 11/04/01 18:26:02 INFO mapred.JobClient:  map 100% reduce 98%
>
> 11/04/01 18:26:05 INFO mapred.JobClient:  map 100% reduce 100%
>
> 11/04/01 18:26:06 INFO mapred.JobClient: Job complete:
> job_201103311244_0045
>
> 11/04/01 18:26:06 INFO mapred.JobClient: Counters: 23
>
> 11/04/01 18:26:06 INFO mapred.JobClient:   Job Counters
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     Launched reduce tasks=1
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=2045245
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     Total time spent by all
> reduces
> waiting after reserving slots (ms)=0
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     Rack-local map tasks=2
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     Launched map tasks=23
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     Data-local map tasks=21
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=287005
>
> 11/04/01 18:26:06 INFO mapred.JobClient:   FileSystemCounters
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     FILE_BYTES_READ=1058213034
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     HDFS_BYTES_READ=1565999068
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1397240087
>
> 11/04/01 18:26:06 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=638557669
>
> 11/04/01 18:26:06 INFO mapred.JobClient:   Map-Reduce Framework
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce input groups=24671345
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Combine output
> records=275836241
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Map input records=3480795
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce shuffle bytes=318629292
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce output records=24671345
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Spilled Records=442523762
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Map output bytes=2873217818
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Combine input
> records=373343059
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Map output records=164272900
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2860
>
> 11/04/01 18:26:07 INFO mapred.JobClient:     Reduce input records=66766082
>
> 11/04/01 18:26:20 INFO pfpgrowth.PFPGrowth: No of Features: 103412
>
> 11/04/01 18:26:20 INFO common.HadoopUtil: Deleting
> hdfs://nnn001:54310/run/MR20/fim/fList
>
> 11/04/01 18:26:21 INFO common.HadoopUtil: Deleting
> hdfs://nnn001:54310/run/MR20/fim/gList
>
> 11/04/01 18:26:21 INFO common.HadoopUtil: Deleting
> /run/MR20/fim/sortedoutput
>
> 11/04/01 18:26:21 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
>
> 11/04/01 18:26:21 INFO input.FileInputFormat: Total input paths to process
> :
> 20
>
> 11/04/01 18:26:21 INFO mapred.JobClient: Running job: job_201103311244_0046
>
> 11/04/01 18:26:22 INFO mapred.JobClient:  map 0% reduce 0%
>
> 11/04/01 18:26:33 INFO mapred.JobClient:  map 27% reduce 0%
>
> 11/04/01 18:26:34 INFO mapred.JobClient:  map 58% reduce 0%
>
> 11/04/01 18:26:35 INFO mapred.JobClient:  map 63% reduce 0%
>
> 11/04/01 18:26:37 INFO mapred.JobClient:  map 81% reduce 0%
>
> 11/04/01 18:26:39 INFO mapred.JobClient:  map 86% reduce 0%
>
> 11/04/01 18:26:40 INFO mapred.JobClient:  map 95% reduce 0%
>
> 11/04/01 18:26:44 INFO mapred.JobClient:  map 95% reduce 21%
>
> 11/04/01 18:26:46 INFO mapred.JobClient:  map 98% reduce 21%
>
> 11/04/01 18:26:47 INFO mapred.JobClient:  map 98% reduce 31%
>
> 11/04/01 18:26:48 INFO mapred.JobClient:  map 100% reduce 31%
>
> 11/04/01 18:26:53 INFO mapred.JobClient:  map 100% reduce 33%
>
> 11/04/01 18:26:58 INFO mapred.JobClient:  map 100% reduce 90%
>
> 11/04/01 18:27:01 INFO mapred.JobClient:  map 100% reduce 100%
>
> 11/04/01 18:27:01 INFO mapred.JobClient: Job complete:
> job_201103311244_0046
>
> 11/04/01 18:27:01 INFO mapred.JobClient: Counters: 22
>
> 11/04/01 18:27:01 INFO mapred.JobClient:   Job Counters
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Launched reduce tasks=1
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=232517
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Total time spent by all
> reduces
> waiting after reserving slots (ms)=0
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Launched map tasks=20
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Data-local map tasks=20
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=24526
>
> 11/04/01 18:27:01 INFO mapred.JobClient:   FileSystemCounters
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     FILE_BYTES_READ=38487572
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     HDFS_BYTES_READ=1618968708
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=81300431
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=84985526
>
> 11/04/01 18:27:01 INFO mapred.JobClient:   Map-Reduce Framework
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce input groups=48309
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Combine output records=0
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Map input records=3480795
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce shuffle bytes=39454031
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce output records=1539349
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Spilled Records=3078698
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Map output bytes=72092246
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Combine input records=0
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Map output records=1539349
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2860
>
> 11/04/01 18:27:01 INFO mapred.JobClient:     Reduce input records=1539349
>
> 11/04/01 18:27:01 INFO common.HadoopUtil: Deleting run/MR20/fim/fpgrowth
>
> 11/04/01 18:27:01 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
>
> 11/04/01 18:27:02 INFO input.FileInputFormat: Total input paths to process
> :
> 1
>
> 11/04/01 18:27:02 INFO mapred.JobClient: Running job: job_201103311244_0047
>
> 11/04/01 18:27:03 INFO mapred.JobClient:  map 0% reduce 0%
>
> 11/04/01 18:27:14 INFO mapred.JobClient:  map 39% reduce 0%
>
> 11/04/01 18:27:17 INFO mapred.JobClient:  map 44% reduce 0%
>
> 11/04/01 18:28:15 INFO mapred.JobClient:  map 64% reduce 0%
>
> 11/04/01 18:28:38 INFO mapred.JobClient:  map 74% reduce 0%
>
> 11/04/01 18:28:41 INFO mapred.JobClient:  map 78% reduce 0%
>
> 11/04/01 18:28:44 INFO mapred.JobClient:  map 88% reduce 0%
>
> 11/04/01 18:28:47 INFO mapred.JobClient:  map 95% reduce 0%
>
> 11/04/01 18:28:50 INFO mapred.JobClient:  map 100% reduce 0%
>
> 11/04/01 18:32:27 INFO mapred.JobClient:  map 100% reduce 33%
>
> 11/04/01 18:34:45 INFO mapred.JobClient:  map 100% reduce 100%
>
>
> ## AND IT's been here for all this while
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message