mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Compton <compton.r...@gmail.com>
Subject trainclassifier -type cbayes dumps text
Date Thu, 11 Apr 2013 22:58:32 GMT
I'm trying to train a simple text classifier using cbayes. I've got
formatted <Text,Text> sequence files created with
com.twitter.elephantbird.pig.store.SequenceFileStorage(), eg:

JOY      actually turning decent new year ☺
JOY      best New Years tonight! ready 2013. <U+1F609> <U+1F38A><U+1F389>
JOY      playing Dream League Soccer iPad 2 earned 13 coins!
JOY      Great way start new ear
JOY      good sober New Years Eve
ANGER_RAGE       Last night frank hasn't done revision prelims
ANGER_RAGE       hell cut forehead such ball ache! Cheers pleb chucks
glass bottles around!
ANGER_RAGE       shops open today customer services shut apparently
being paid "come back tomorrow".

These are stored in a directory as:
/emotion-training-labeled/part-m-0000*

I pass the labeled data into cbayes:

mahout trainclassifier -i /emotion-training-labeled/ -o emotion-model/
-type cbayes -ng 1 -source hdfs

Both map and reduce get to 100%,  then I see something about Tf-Idf
followed by what looks like a complete dump of my training data print
to the screen for the next few minutes and then a stack trace:

rything life teach lesson, willing observe learn.” YUP!GJOYB Halbrecht
DAN CASTAIC CA found local Videographer. Register FREE:"JOY Palm Read
Easy Created WorldJOY=1.0, ANGER_RAGE people fisty latelyK=1.0,
ANGER_RAGE ew gon lot em ��=1.0, ANGER_RAGE ain't gonna love =1.0}
13/04/11 15:46:51 INFO common.BayesTfIdfDriver: {dataSource=hdfs,
alpha_i=1.0, minDf=1, gramSize=1}
13/04/11 15:46:51 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
13/04/11 15:46:57 INFO mapred.FileInputFormat: Total input paths to process : 3
13/04/11 15:46:58 INFO mapred.JobClient: Cleaning up the staging area
hdfs://master/user/rfcompton/.staging/job_201303271312_2786
13/04/11 15:46:58 ERROR security.UserGroupInformation:
PriviledgedActionException as:rfcompton (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.io.IOException: Exceeded max jobconf size: 10706309 limit:
5242880
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
Caused by: java.io.IOException: Exceeded max jobconf size: 10706309
limit: 5242880
        at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:406)
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
        ... 10 more

Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
java.io.IOException: java.io.IOException: Exceeded max jobconf size:
10706309 limit: 5242880
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
Caused by: java.io.IOException: Exceeded max jobconf size: 10706309
limit: 5242880
        at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:406)
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
        ... 10 more

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
        at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:904)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1242)
        at org.apache.mahout.classifier.bayes.mapreduce.common.BayesTfIdfDriver.runJob(BayesTfIdfDriver.java:97)
        at org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:51)
        at org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:58)
        at org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:151)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Mime
View raw message