mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bing wang <wangbing7...@gmail.com>
Subject PfgGrowth job got stuck when run into fpGrowth.generateTopKFrequentPatterns
Date Thu, 22 Sep 2011 07:36:02 GMT
Hi all,

I run PfpGrowth in a big hadoop cluster , but the ParallelFPgrowth MR job
got stuck when some ParallelFPGrowthReducer tasks run into
fpGrowth.generateTopKFrequentPatterns().

I feed 6,468,284 record for PfpGrowth, and 110825 features greater than
minSupport(5), numGroups(20000), treeCacheSize(10) and with other paramters
as default.
I also set the hadoop mapred.child.java.opts option to 3g.

The cluster started 20 ParallelFPGrowthReducer tasks, but about 10 tasks was
stuck at the same reason.

See the reducer task log(Log 1):

2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size 871
for key 8120
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items
871
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48722=>3
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48723=>4
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48720=>5
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48721=>0
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48724=>2
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48725=>1
2011-09-20 21:11:49,860 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned
items 27
2011-09-20 21:11:49,861 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP
Tree: 93
2011-09-20 21:11:49,861 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 5
2011-09-20 21:11:49,861 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with
Least Support 10
2011-09-20 21:11:49,861 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 4
2011-09-20 21:11:49,861 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with
Least Support 13
2011-09-20 21:11:49,861 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 3
2011-09-20 21:11:49,862 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with
Least Support 13
2011-09-20 21:11:49,862 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 2
2011-09-20 21:11:49,862 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with
Least Support 14
2011-09-20 21:11:49,862 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 1
2011-09-20 21:11:49,863 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with
Least Support 15
2011-09-20 21:11:49,863 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 0
2011-09-20 21:11:49,863 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with
Least Support 17
2011-09-20 21:11:49,863 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Tree Cache: First Level:
Cache hits=20 Cache Misses=127
2011-09-20 21:11:49,864 INFO
org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size
1410 for key 8140
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items
1410
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48842=>22
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48843=>23
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48840=>21
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48841=>27
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48844=>266
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 48845=>28
2011-09-20 21:11:49,865 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned
items 353
2011-09-20 21:11:49,866 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP
Tree: 2512
2011-09-20 21:11:49,866 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 266


>From last line of Log 1, we can see the reducer task which get stuck is
mining FTree Tree for all patterns with the attribute 266.

However I think the task is still running because some reducer tasks will
keep moving after long time mining(see bellow log, Log 2)

2011-09-20 21:24:08,577 INFO
org.apache.mahout.fpm.pfpgrowth.ParallelCountingReducer: localFList size
1183 for key 13596
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique items
1183
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81581=>250
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81580=>132
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81577=>129
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81576=>128
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81579=>131
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Adding Pattern 81578=>130
2011-09-20 21:24:08,578 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of unique pruned
items 251
2011-09-20 21:24:08,579 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Number of Nodes in the FP
Tree: 1376
2011-09-20 21:24:08,579 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 250
2011-09-20 21:24:08,579 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with
Least Support 5
2011-09-20 21:24:08,579 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 132   (take 2.5 hours )
2011-09-21 00:08:56,616 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 40 Patterns with
Least Support 7
2011-09-21 00:08:56,665 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 131
2011-09-21 00:08:56,703 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 3 Patterns with
Least Support 7
2011-09-21 00:08:56,703 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 130
2011-09-21 00:08:56,736 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 2 Patterns with
Least Support 7
2011-09-21 00:08:56,736 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 129
2011-09-21 00:08:56,769 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with
Least Support 7
2011-09-21 00:08:56,769 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Mining FTree Tree for all
patterns with 128
2011-09-21 00:08:56,809 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Found 1 Patterns with
Least Support 7
2011-09-21 00:08:56,809 INFO
org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth: Tree Cache: First Level:
Cache hits=13 Cache Misses=475

And, the task status showed *"Processing FPTree: Bottom Up FP Growth >
reduce"*, which indicates that the job run into FpGrowth.growthBottomUp()
and be stuck within FpGrowth.growthBottomUp().

Some reduers will hold at this status about 25 hours till I kill the whole
PfpGrowth job manually.(see job history as follows)

*JobName: * PFP Parallel FPGrowth running over input (ignore)/sortedoutput
*JobConf: *
hdfs://hdpnn:9000/home/hadoop/cluster-data/mapred/system/job_201108301259_1089136/job.xml<http://10.249.54.103/jobconf_history.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&jobLogDir=file:/home/hadoop/cluster-data/logs/history&jobUniqueString=hdpjt_1314680340456_job_201108301259_1089136>
*Submitted At: * 20-Sep-2011 20:31:57
*Launched At: * 20-Sep-2011 20:32:04 (6sec)
*Finished At: * 21-Sep-2011 22:31:51 (25hrs, 59mins, 47sec)
*Status: * KILLED
*Analyse This Job<http://10.249.54.103/analysejobhistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp>
*
------------------------------

Counter Map Reduce Total   KindTotal Tasks(successful+failed+killed)Successful
tasksFailed tasksKilled tasksStart TimeFinish Time  Setup
1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=all>
1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=SUCCESS>
0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=FAILED>
0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=SETUP&status=KILLED>
20-Sep-2011
20:32:06 20-Sep-2011 20:32:07 (0sec)  Map
28<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=all>
22<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=SUCCESS>
0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=FAILED>
6<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=MAP&status=KILLED>
20-Sep-2011
20:32:13 20-Sep-2011 20:56:01 (23mins, 48sec)  Reduce
24<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=all>
18<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=SUCCESS>
0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=FAILED>
6<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=REDUCE&status=KILLED>
20-Sep-2011
20:32:33 21-Sep-2011 22:31:51 (25hrs, 59mins, 18sec)  Cleanup
1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=all>
1<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=SUCCESS>
0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=FAILED>
0<http://10.249.54.103/jobtaskshistory.jsp?jobid=job_201108301259_1089136&logFile=file:/home/hadoop/cluster-data/logs/history/hdpjt_1314680340456_job_201108301259_1089136_lingfeng.wb_PFP%2BParallel%2BFPGrowth%2Brunning%2Bover%2Binput%252Fgroup%252Ftbp&taskType=CLEANUP&status=KILLED>
21-Sep-2011
22:31:50 21-Sep-2011 22:31:51 (0sec)


In Log 2, we see FP Tree for key 13596 only includes 1376 nodes,
however it take 2.5 hours to ming patterns for 132.

So why it was so slowly when mining frequent patterns for some attributes?
What's the reason behinds that or how can I resolve?

Thanks in advance!



-- 
Gmail/talk: wangbing7928@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message