mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Cunningham <sam_cun...@yahoo.com>
Subject Re: trainclassifier as a command vs. TrainClassifier.java
Date Tue, 15 Nov 2011 04:22:12 GMT
Well, the worst thing is that when I run the TrainClassifier.java, it doesn't
distribute the job to datanodes even though the source is set to hdfs.
Below, I am providing outputs for running trainclassifier command and
TrainClassifier.java.

Here is the output of running trainclassifier command:

sayhan@A4915037:~$ $MAHOUT_HOME/bin/mahout trainclassifier -i articles-train
-o articles-model -type cbayes -ng 1 -source hdfs
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop-0.20.2
HADOOP_CONF_DIR=/usr/local/hadoop-0.20.2/conf
11/11/14 22:14:25 WARN driver.MahoutDriver: No trainclassifier.props found
on classpath, will use command-line arguments only
11/11/14 22:14:25 INFO bayes.TrainClassifier: Training Complementary Bayes
Classifier
11/11/14 22:14:26 INFO common.HadoopUtil: Deleting articles-model
11/11/14 22:14:26 INFO cbayes.CBayesDriver: Reading features...
11/11/14 22:14:26 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 22:14:26 INFO mapred.FileInputFormat: Total input paths to process
: 7
11/11/14 22:14:27 INFO mapred.JobClient: Running job: job_201111142210_0001
11/11/14 22:14:28 INFO mapred.JobClient:  map 0% reduce 0%
11/11/14 22:14:39 INFO mapred.JobClient:  map 28% reduce 0%
11/11/14 22:14:45 INFO mapred.JobClient:  map 57% reduce 0%
11/11/14 22:14:48 INFO mapred.JobClient:  map 71% reduce 9%
11/11/14 22:14:51 INFO mapred.JobClient:  map 100% reduce 9%
11/11/14 22:14:57 INFO mapred.JobClient:  map 100% reduce 23%
11/11/14 22:15:03 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 22:15:05 INFO mapred.JobClient: Job complete: job_201111142210_0001
11/11/14 22:15:05 INFO mapred.JobClient: Counters: 18
11/11/14 22:15:05 INFO mapred.JobClient:   Job Counters 
11/11/14 22:15:05 INFO mapred.JobClient:     Launched reduce tasks=1
11/11/14 22:15:05 INFO mapred.JobClient:     Launched map tasks=7
11/11/14 22:15:05 INFO mapred.JobClient:     Data-local map tasks=7
11/11/14 22:15:05 INFO mapred.JobClient:   FileSystemCounters
11/11/14 22:15:05 INFO mapred.JobClient:     FILE_BYTES_READ=2473171
11/11/14 22:15:05 INFO mapred.JobClient:     HDFS_BYTES_READ=404467
11/11/14 22:15:05 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4946602
11/11/14 22:15:05 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2078615
11/11/14 22:15:05 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 22:15:05 INFO mapred.JobClient:     Reduce input groups=64653
11/11/14 22:15:05 INFO mapred.JobClient:     Combine output records=79675
11/11/14 22:15:05 INFO mapred.JobClient:     Map input records=97
11/11/14 22:15:05 INFO mapred.JobClient:     Reduce shuffle bytes=2288236
11/11/14 22:15:05 INFO mapred.JobClient:     Reduce output records=52247
11/11/14 22:15:05 INFO mapred.JobClient:     Spilled Records=159350
11/11/14 22:15:05 INFO mapred.JobClient:     Map output bytes=4039212
11/11/14 22:15:05 INFO mapred.JobClient:     Map input bytes=404467
11/11/14 22:15:05 INFO mapred.JobClient:     Combine input records=142673
11/11/14 22:15:05 INFO mapred.JobClient:     Map output records=142673
11/11/14 22:15:05 INFO mapred.JobClient:     Reduce input records=79675
11/11/14 22:15:05 INFO cbayes.CBayesDriver: Calculating Tf-Idf...
11/11/14 22:15:05 INFO common.BayesTfIdfDriver: Counts of documents in Each
Label
11/11/14 22:15:05 INFO common.BayesTfIdfDriver: {General=29.0,
Business=21.0, Politics=10.0, SciTech=9.0, Entertainment=8.0, Health=9.0,
Sports=11.0}
11/11/14 22:15:05 INFO common.BayesTfIdfDriver: {dataSource=hdfs,
alpha_i=1.0, minDf=1, gramSize=1}
11/11/14 22:15:05 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 22:15:05 INFO mapred.FileInputFormat: Total input paths to process
: 3
11/11/14 22:15:06 INFO mapred.JobClient: Running job: job_201111142210_0002
11/11/14 22:15:07 INFO mapred.JobClient:  map 0% reduce 0%
11/11/14 22:15:18 INFO mapred.JobClient:  map 66% reduce 0%
11/11/14 22:15:21 INFO mapred.JobClient:  map 100% reduce 0%
11/11/14 22:15:30 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 22:15:32 INFO mapred.JobClient: Job complete: job_201111142210_0002
11/11/14 22:15:32 INFO mapred.JobClient: Counters: 18
11/11/14 22:15:32 INFO mapred.JobClient:   Job Counters 
11/11/14 22:15:32 INFO mapred.JobClient:     Launched reduce tasks=1
11/11/14 22:15:32 INFO mapred.JobClient:     Launched map tasks=3
11/11/14 22:15:32 INFO mapred.JobClient:     Data-local map tasks=3
11/11/14 22:15:32 INFO mapred.JobClient:   FileSystemCounters
11/11/14 22:15:32 INFO mapred.JobClient:     FILE_BYTES_READ=1405021
11/11/14 22:15:32 INFO mapred.JobClient:     HDFS_BYTES_READ=2078279
11/11/14 22:15:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2810150
11/11/14 22:15:32 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=830343
11/11/14 22:15:32 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 22:15:32 INFO mapred.JobClient:     Reduce input groups=19918
11/11/14 22:15:32 INFO mapred.JobClient:     Combine output records=39835
11/11/14 22:15:32 INFO mapred.JobClient:     Map input records=52240
11/11/14 22:15:32 INFO mapred.JobClient:     Reduce shuffle bytes=1405033
11/11/14 22:15:32 INFO mapred.JobClient:     Reduce output records=19918
11/11/14 22:15:32 INFO mapred.JobClient:     Spilled Records=79670
11/11/14 22:15:32 INFO mapred.JobClient:     Map output bytes=1536230
11/11/14 22:15:32 INFO mapred.JobClient:     Map input bytes=2077982
11/11/14 22:15:32 INFO mapred.JobClient:     Combine input records=52240
11/11/14 22:15:32 INFO mapred.JobClient:     Map output records=52240
11/11/14 22:15:32 INFO mapred.JobClient:     Reduce input records=39835
11/11/14 22:15:32 INFO cbayes.CBayesDriver: Calculating weight sums for
labels and features...
11/11/14 22:15:32 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 22:15:32 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/11/14 22:15:32 INFO mapred.JobClient: Running job: job_201111142210_0003
11/11/14 22:15:33 INFO mapred.JobClient:  map 0% reduce 0%
11/11/14 22:15:44 INFO mapred.JobClient:  map 100% reduce 0%
11/11/14 22:15:56 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 22:15:58 INFO mapred.JobClient: Job complete: job_201111142210_0003
11/11/14 22:15:58 INFO mapred.JobClient: Counters: 18
11/11/14 22:15:58 INFO mapred.JobClient:   Job Counters 
11/11/14 22:15:58 INFO mapred.JobClient:     Launched reduce tasks=1
11/11/14 22:15:58 INFO mapred.JobClient:     Launched map tasks=2
11/11/14 22:15:58 INFO mapred.JobClient:     Data-local map tasks=2
11/11/14 22:15:58 INFO mapred.JobClient:   FileSystemCounters
11/11/14 22:15:58 INFO mapred.JobClient:     FILE_BYTES_READ=411173
11/11/14 22:15:58 INFO mapred.JobClient:     HDFS_BYTES_READ=831759
11/11/14 22:15:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=822416
11/11/14 22:15:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=418302
11/11/14 22:15:58 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 22:15:58 INFO mapred.JobClient:     Reduce input groups=12414
11/11/14 22:15:58 INFO mapred.JobClient:     Combine output records=15147
11/11/14 22:15:58 INFO mapred.JobClient:     Map input records=19917
11/11/14 22:15:58 INFO mapred.JobClient:     Reduce shuffle bytes=193042
11/11/14 22:15:58 INFO mapred.JobClient:     Reduce output records=12414
11/11/14 22:15:58 INFO mapred.JobClient:     Spilled Records=30294
11/11/14 22:15:58 INFO mapred.JobClient:     Map output bytes=1359759
11/11/14 22:15:58 INFO mapred.JobClient:     Map input bytes=830120
11/11/14 22:15:58 INFO mapred.JobClient:     Combine input records=59751
11/11/14 22:15:58 INFO mapred.JobClient:     Map output records=59751
11/11/14 22:15:58 INFO mapred.JobClient:     Reduce input records=15147
11/11/14 22:15:58 INFO cbayes.CBayesDriver: Calculating the weight
Normalisation factor for each complement class...
11/11/14 22:15:58 INFO cbayes.CBayesThetaNormalizerDriver: Sigma_k for Each
Label
11/11/14 22:15:58 INFO cbayes.CBayesThetaNormalizerDriver:
{General=425.5688048473669, Business=310.4814167759874,
Politics=122.77282039891006, SciTech=106.6295819954554,
Entertainment=96.49866484620279, Health=83.86073970940757,
Sports=113.63460527845832}
11/11/14 22:15:58 INFO cbayes.CBayesThetaNormalizerDriver: Sigma_kSigma_j
for each Label and for each Features
11/11/14 22:15:58 INFO cbayes.CBayesThetaNormalizerDriver:
1259.4466338517686
11/11/14 22:15:58 INFO cbayes.CBayesThetaNormalizerDriver: Vocabulary Count
11/11/14 22:15:58 INFO cbayes.CBayesThetaNormalizerDriver: 12406.0
11/11/14 22:15:58 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 22:15:59 INFO mapred.FileInputFormat: Total input paths to process
: 2
11/11/14 22:15:59 INFO mapred.JobClient: Running job: job_201111142210_0004
11/11/14 22:16:00 INFO mapred.JobClient:  map 0% reduce 0%
11/11/14 22:16:12 INFO mapred.JobClient:  map 66% reduce 0%
11/11/14 22:16:15 INFO mapred.JobClient:  map 100% reduce 0%
11/11/14 22:16:24 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 22:16:26 INFO mapred.JobClient: Job complete: job_201111142210_0004
11/11/14 22:16:26 INFO mapred.JobClient: Counters: 18
11/11/14 22:16:26 INFO mapred.JobClient:   Job Counters 
11/11/14 22:16:26 INFO mapred.JobClient:     Launched reduce tasks=1
11/11/14 22:16:26 INFO mapred.JobClient:     Launched map tasks=3
11/11/14 22:16:26 INFO mapred.JobClient:     Data-local map tasks=3
11/11/14 22:16:26 INFO mapred.JobClient:   FileSystemCounters
11/11/14 22:16:26 INFO mapred.JobClient:     FILE_BYTES_READ=424
11/11/14 22:16:26 INFO mapred.JobClient:     HDFS_BYTES_READ=1248704
11/11/14 22:16:26 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=956
11/11/14 22:16:26 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=336
11/11/14 22:16:26 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 22:16:26 INFO mapred.JobClient:     Reduce input groups=7
11/11/14 22:16:26 INFO mapred.JobClient:     Combine output records=15
11/11/14 22:16:26 INFO mapred.JobClient:     Map input records=32323
11/11/14 22:16:26 INFO mapred.JobClient:     Reduce shuffle bytes=436
11/11/14 22:16:26 INFO mapred.JobClient:     Reduce output records=7
11/11/14 22:16:26 INFO mapred.JobClient:     Spilled Records=30
11/11/14 22:16:26 INFO mapred.JobClient:     Map output bytes=2752586
11/11/14 22:16:26 INFO mapred.JobClient:     Map input bytes=1247862
11/11/14 22:16:26 INFO mapred.JobClient:     Combine input records=106759
11/11/14 22:16:26 INFO mapred.JobClient:     Map output records=106759
11/11/14 22:16:26 INFO mapred.JobClient:     Reduce input records=15
11/11/14 22:16:26 INFO common.HadoopUtil: Deleting
articles-model/trainer-docCount
11/11/14 22:16:26 INFO common.HadoopUtil: Deleting
articles-model/trainer-termDocCount
11/11/14 22:16:26 INFO common.HadoopUtil: Deleting
articles-model/trainer-featureCount
11/11/14 22:16:26 INFO common.HadoopUtil: Deleting
articles-model/trainer-wordFreq
11/11/14 22:16:26 INFO common.HadoopUtil: Deleting
articles-model/trainer-tfIdf/trainer-vocabCount
11/11/14 22:16:26 INFO driver.MahoutDriver: Program took 121146 ms

and here is the output of running TrainClassifier.java with the same
options:

11/11/14 23:06:23 INFO trainer.TrainClassifier: Training Bayes Classifier
11/11/14 23:06:23 INFO common.HadoopUtil: Deleting
hdfs://localhost:9000/user/sayhan/articles-model
11/11/14 23:06:23 INFO bayes.BayesDriver: Reading features...
11/11/14 23:06:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
11/11/14 23:06:23 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 23:06:23 INFO mapred.FileInputFormat: Total input paths to process
: 7
11/11/14 23:06:23 INFO mapred.JobClient: Running job: job_local_0001
11/11/14 23:06:23 INFO mapred.FileInputFormat: Total input paths to process
: 7
11/11/14 23:06:23 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:23 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:24 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:24 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:24 INFO common.BayesFeatureMapper: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:24 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:24 INFO mapred.JobClient:  map 0% reduce 0%
11/11/14 23:06:25 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:25 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0
is done. And is in the process of commiting
11/11/14 23:06:25 INFO mapred.LocalJobRunner: Bayes Feature Mapper: Document
Label: Business
11/11/14 23:06:25 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000000_0' done.
11/11/14 23:06:25 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:25 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:25 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:25 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:25 INFO common.BayesFeatureMapper: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:25 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:25 INFO mapred.JobClient:  map 100% reduce 0%
11/11/14 23:06:26 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:26 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0
is done. And is in the process of commiting
11/11/14 23:06:26 INFO mapred.LocalJobRunner: Bayes Feature Mapper: Document
Label: Entertainment
11/11/14 23:06:26 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000001_0' done.
11/11/14 23:06:26 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:26 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:26 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:26 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:26 INFO common.BayesFeatureMapper: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:26 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:28 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:28 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0
is done. And is in the process of commiting
11/11/14 23:06:28 INFO mapred.LocalJobRunner: Bayes Feature Mapper: Document
Label: General
11/11/14 23:06:28 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000002_0' done.
11/11/14 23:06:28 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:28 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:28 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:28 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:28 INFO common.BayesFeatureMapper: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:28 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:28 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:28 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000003_0
is done. And is in the process of commiting
11/11/14 23:06:28 INFO mapred.LocalJobRunner: Bayes Feature Mapper: Document
Label: Health
11/11/14 23:06:28 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000003_0' done.
11/11/14 23:06:28 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:28 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:28 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:28 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:28 INFO common.BayesFeatureMapper: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:28 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:28 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:28 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000004_0
is done. And is in the process of commiting
11/11/14 23:06:28 INFO mapred.LocalJobRunner: Bayes Feature Mapper: Document
Label: Politics
11/11/14 23:06:28 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000004_0' done.
11/11/14 23:06:29 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:29 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:29 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:29 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:29 INFO common.BayesFeatureMapper: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:29 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:29 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:29 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000005_0
is done. And is in the process of commiting
11/11/14 23:06:29 INFO mapred.LocalJobRunner: Bayes Feature Mapper: Document
Label: SciTech
11/11/14 23:06:29 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000005_0' done.
11/11/14 23:06:29 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:29 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:29 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:29 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:29 INFO common.BayesFeatureMapper: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:29 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:29 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:29 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000006_0
is done. And is in the process of commiting
11/11/14 23:06:29 INFO mapred.LocalJobRunner: Bayes Feature Mapper: Document
Label: Sports
11/11/14 23:06:29 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000006_0' done.
11/11/14 23:06:29 INFO mapred.LocalJobRunner: 
11/11/14 23:06:29 INFO mapred.Merger: Merging 7 sorted segments
11/11/14 23:06:29 INFO mapred.Merger: Down to the last merge-pass, with 7
segments left of total size: 2473179 bytes
11/11/14 23:06:29 INFO mapred.LocalJobRunner: 
11/11/14 23:06:29 INFO common.BayesFeatureReducer: Bayes Parameter
{alpha_i=1.0, dataSource=hdfs, gramSize=1}
11/11/14 23:06:31 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0
is done. And is in the process of commiting
11/11/14 23:06:31 INFO mapred.LocalJobRunner: 
11/11/14 23:06:31 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0
is allowed to commit now
11/11/14 23:06:31 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0001_r_000000_0' to
hdfs://localhost:9000/user/sayhan/articles-model
11/11/14 23:06:31 INFO mapred.LocalJobRunner: Bayes Feature Reducer: [__WT,
Entertainment, ˌmelənˈkōlēə] => 0.0193288037570819 > reduce
11/11/14 23:06:31 INFO mapred.TaskRunner: Task
'attempt_local_0001_r_000000_0' done.
11/11/14 23:06:31 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 23:06:31 INFO mapred.JobClient: Job complete: job_local_0001
11/11/14 23:06:31 INFO mapred.JobClient: Counters: 15
11/11/14 23:06:31 INFO mapred.JobClient:   FileSystemCounters
11/11/14 23:06:31 INFO mapred.JobClient:     FILE_BYTES_READ=14275591
11/11/14 23:06:31 INFO mapred.JobClient:     HDFS_BYTES_READ=2268546
11/11/14 23:06:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=25483162
11/11/14 23:06:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2078615
11/11/14 23:06:31 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 23:06:31 INFO mapred.JobClient:     Reduce input groups=64653
11/11/14 23:06:31 INFO mapred.JobClient:     Combine output records=79675
11/11/14 23:06:31 INFO mapred.JobClient:     Map input records=97
11/11/14 23:06:31 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/14 23:06:31 INFO mapred.JobClient:     Reduce output records=52247
11/11/14 23:06:31 INFO mapred.JobClient:     Spilled Records=159350
11/11/14 23:06:31 INFO mapred.JobClient:     Map output bytes=4039212
11/11/14 23:06:31 INFO mapred.JobClient:     Map input bytes=404467
11/11/14 23:06:31 INFO mapred.JobClient:     Combine input records=142673
11/11/14 23:06:31 INFO mapred.JobClient:     Map output records=142673
11/11/14 23:06:31 INFO mapred.JobClient:     Reduce input records=79675
11/11/14 23:06:31 INFO bayes.BayesDriver: Calculating Tf-Idf...
11/11/14 23:06:31 INFO common.BayesTfIdfDriver: Counts of documents in Each
Label
11/11/14 23:06:31 INFO common.BayesTfIdfDriver: {General=29.0,
Business=21.0, Politics=10.0, SciTech=9.0, Entertainment=8.0, Health=9.0,
Sports=11.0}
11/11/14 23:06:31 INFO common.BayesTfIdfDriver: {dataSource=hdfs,
alpha_i=1.0, gramSize=1}
11/11/14 23:06:31 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
11/11/14 23:06:31 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 23:06:32 INFO mapred.FileInputFormat: Total input paths to process
: 3
11/11/14 23:06:32 INFO mapred.JobClient: Running job: job_local_0002
11/11/14 23:06:32 INFO mapred.FileInputFormat: Total input paths to process
: 3
11/11/14 23:06:32 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:32 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:32 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:32 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:32 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:33 INFO mapred.JobClient:  map 0% reduce 0%
11/11/14 23:06:33 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:33 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000000_0
is done. And is in the process of commiting
11/11/14 23:06:33 INFO mapred.LocalJobRunner: Bayes TfIdf Mapper: log(Idf):
[__WT, Entertainment, ˌmelənˈkōlēə]
11/11/14 23:06:33 INFO mapred.TaskRunner: Task
'attempt_local_0002_m_000000_0' done.
11/11/14 23:06:33 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:33 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:33 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:33 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:33 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:34 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:34 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000001_0
is done. And is in the process of commiting
11/11/14 23:06:34 INFO mapred.LocalJobRunner: Bayes TfIdf Mapper: Tf: [__WT,
Entertainment, ˌmelənˈkōlēə]
11/11/14 23:06:34 INFO mapred.TaskRunner: Task
'attempt_local_0002_m_000001_0' done.
11/11/14 23:06:34 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:34 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:34 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:34 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:34 INFO mapred.JobClient:  map 100% reduce 0%
11/11/14 23:06:34 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:34 INFO common.BayesTfIdfReducer: [__FS]	12406.0
11/11/14 23:06:34 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:34 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000002_0
is done. And is in the process of commiting
11/11/14 23:06:34 INFO mapred.LocalJobRunner: Bayes TfIdf Mapper: vocabCount
11/11/14 23:06:34 INFO mapred.TaskRunner: Task
'attempt_local_0002_m_000002_0' done.
11/11/14 23:06:34 INFO mapred.LocalJobRunner: 
11/11/14 23:06:34 INFO mapred.Merger: Merging 3 sorted segments
11/11/14 23:06:34 INFO mapred.Merger: Down to the last merge-pass, with 3
segments left of total size: 1405021 bytes
11/11/14 23:06:34 INFO mapred.LocalJobRunner: 
11/11/14 23:06:34 INFO common.BayesTfIdfReducer: [__FS]	12406.0
11/11/14 23:06:34 INFO mapred.TaskRunner: Task:attempt_local_0002_r_000000_0
is done. And is in the process of commiting
11/11/14 23:06:34 INFO mapred.LocalJobRunner: 
11/11/14 23:06:34 INFO mapred.TaskRunner: Task attempt_local_0002_r_000000_0
is allowed to commit now
11/11/14 23:06:34 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0002_r_000000_0' to
hdfs://localhost:9000/user/sayhan/articles-model/trainer-tfIdf
11/11/14 23:06:34 INFO mapred.LocalJobRunner: Bayes TfIdf Reducer: [__WT,
Sports, zone] => 0.2868487930622821 > reduce
11/11/14 23:06:34 INFO mapred.TaskRunner: Task
'attempt_local_0002_r_000000_0' done.
11/11/14 23:06:35 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 23:06:35 INFO mapred.JobClient: Job complete: job_local_0002
11/11/14 23:06:35 INFO mapred.JobClient: Counters: 15
11/11/14 23:06:35 INFO mapred.JobClient:   FileSystemCounters
11/11/14 23:06:35 INFO mapred.JobClient:     FILE_BYTES_READ=23101705
11/11/14 23:06:35 INFO mapred.JobClient:     HDFS_BYTES_READ=8266427
11/11/14 23:06:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=26860827
11/11/14 23:06:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=9144803
11/11/14 23:06:35 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 23:06:35 INFO mapred.JobClient:     Reduce input groups=19918
11/11/14 23:06:35 INFO mapred.JobClient:     Combine output records=39835
11/11/14 23:06:35 INFO mapred.JobClient:     Map input records=52240
11/11/14 23:06:35 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/14 23:06:35 INFO mapred.JobClient:     Reduce output records=19918
11/11/14 23:06:35 INFO mapred.JobClient:     Spilled Records=79670
11/11/14 23:06:35 INFO mapred.JobClient:     Map output bytes=1536230
11/11/14 23:06:35 INFO mapred.JobClient:     Map input bytes=2077982
11/11/14 23:06:35 INFO mapred.JobClient:     Combine input records=52240
11/11/14 23:06:35 INFO mapred.JobClient:     Map output records=52240
11/11/14 23:06:35 INFO mapred.JobClient:     Reduce input records=39835
11/11/14 23:06:35 INFO bayes.BayesDriver: Calculating weight sums for labels
and features...
11/11/14 23:06:35 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
11/11/14 23:06:35 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 23:06:35 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/11/14 23:06:35 INFO mapred.JobClient: Running job: job_local_0003
11/11/14 23:06:35 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/11/14 23:06:35 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:35 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:35 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:35 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:35 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:36 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:36 INFO mapred.TaskRunner: Task:attempt_local_0003_m_000000_0
is done. And is in the process of commiting
11/11/14 23:06:36 INFO mapred.LocalJobRunner: Bayes Weight Summer Mapper:
[__WT, Sports, zone]
11/11/14 23:06:36 INFO mapred.TaskRunner: Task
'attempt_local_0003_m_000000_0' done.
11/11/14 23:06:36 INFO mapred.LocalJobRunner: 
11/11/14 23:06:36 INFO mapred.Merger: Merging 1 sorted segments
11/11/14 23:06:36 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 339423 bytes
11/11/14 23:06:36 INFO mapred.LocalJobRunner: 
11/11/14 23:06:36 INFO mapred.JobClient:  map 100% reduce 0%
11/11/14 23:06:36 INFO mapred.TaskRunner: Task:attempt_local_0003_r_000000_0
is done. And is in the process of commiting
11/11/14 23:06:36 INFO mapred.LocalJobRunner: 
11/11/14 23:06:36 INFO mapred.TaskRunner: Task attempt_local_0003_r_000000_0
is allowed to commit now
11/11/14 23:06:36 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0003_r_000000_0' to
hdfs://localhost:9000/user/sayhan/articles-model/trainer-weights
11/11/14 23:06:36 INFO mapred.LocalJobRunner: Bayes Weight Summer Reducer:
[__SK, Sports] => 113.63460527845817 > reduce
11/11/14 23:06:36 INFO mapred.TaskRunner: Task
'attempt_local_0003_r_000000_0' done.
11/11/14 23:06:37 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 23:06:37 INFO mapred.JobClient: Job complete: job_local_0003
11/11/14 23:06:37 INFO mapred.JobClient: Counters: 15
11/11/14 23:06:37 INFO mapred.JobClient:   FileSystemCounters
11/11/14 23:06:37 INFO mapred.JobClient:     FILE_BYTES_READ=16948079
11/11/14 23:06:37 INFO mapred.JobClient:     HDFS_BYTES_READ=6626602
11/11/14 23:06:37 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=17470068
11/11/14 23:06:37 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=6236218
11/11/14 23:06:37 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 23:06:37 INFO mapred.JobClient:     Reduce input groups=12414
11/11/14 23:06:37 INFO mapred.JobClient:     Combine output records=12414
11/11/14 23:06:37 INFO mapred.JobClient:     Map input records=19917
11/11/14 23:06:37 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/14 23:06:37 INFO mapred.JobClient:     Reduce output records=12414
11/11/14 23:06:37 INFO mapred.JobClient:     Spilled Records=24828
11/11/14 23:06:37 INFO mapred.JobClient:     Map output bytes=1359759
11/11/14 23:06:37 INFO mapred.JobClient:     Map input bytes=830120
11/11/14 23:06:37 INFO mapred.JobClient:     Combine input records=59751
11/11/14 23:06:37 INFO mapred.JobClient:     Map output records=59751
11/11/14 23:06:37 INFO mapred.JobClient:     Reduce input records=12414
11/11/14 23:06:37 INFO bayes.BayesDriver: Calculating the weight
Normalisation factor for each class...
11/11/14 23:06:37 INFO bayes.BayesThetaNormalizerDriver: Sigma_k for Each
Label
11/11/14 23:06:37 INFO bayes.BayesThetaNormalizerDriver:
{General=425.5688048473677, Business=310.48141677598676,
Politics=122.77282039891027, SciTech=106.62958199545514,
Entertainment=96.4986648462028, Health=83.86073970940774,
Sports=113.63460527845817}
11/11/14 23:06:37 INFO bayes.BayesThetaNormalizerDriver: Sigma_kSigma_j for
each Label and for each Features
11/11/14 23:06:37 INFO bayes.BayesThetaNormalizerDriver: 1259.4466338518014
11/11/14 23:06:37 INFO bayes.BayesThetaNormalizerDriver: Vocabulary Count
11/11/14 23:06:37 INFO bayes.BayesThetaNormalizerDriver: 12406.0
11/11/14 23:06:37 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
11/11/14 23:06:37 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/11/14 23:06:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/11/14 23:06:37 INFO mapred.JobClient: Running job: job_local_0004
11/11/14 23:06:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/11/14 23:06:37 INFO mapred.MapTask: numReduceTasks: 1
11/11/14 23:06:37 INFO mapred.MapTask: io.sort.mb = 100
11/11/14 23:06:37 INFO mapred.MapTask: data buffer = 79691776/99614720
11/11/14 23:06:37 INFO mapred.MapTask: record buffer = 262144/327680
11/11/14 23:06:37 INFO mapred.MapTask: Starting flush of map output
11/11/14 23:06:37 INFO mapred.MapTask: Finished spill 0
11/11/14 23:06:37 INFO mapred.TaskRunner: Task:attempt_local_0004_m_000000_0
is done. And is in the process of commiting
11/11/14 23:06:37 INFO mapred.LocalJobRunner: Bayes Theta Normalizer Mapper:
Sports
11/11/14 23:06:37 INFO mapred.TaskRunner: Task
'attempt_local_0004_m_000000_0' done.
11/11/14 23:06:37 INFO mapred.LocalJobRunner: 
11/11/14 23:06:37 INFO mapred.Merger: Merging 1 sorted segments
11/11/14 23:06:37 INFO mapred.Merger: Down to the last merge-pass, with 1
segments left of total size: 197 bytes
11/11/14 23:06:37 INFO mapred.LocalJobRunner: 
11/11/14 23:06:37 INFO mapred.TaskRunner: Task:attempt_local_0004_r_000000_0
is done. And is in the process of commiting
11/11/14 23:06:37 INFO mapred.LocalJobRunner: 
11/11/14 23:06:37 INFO mapred.TaskRunner: Task attempt_local_0004_r_000000_0
is allowed to commit now
11/11/14 23:06:37 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0004_r_000000_0' to
hdfs://localhost:9000/user/sayhan/articles-model/trainer-thetaNormalizer
11/11/14 23:06:37 INFO mapred.LocalJobRunner: Bayes Theta Normalizer
Reducer: [_LTN, Sports] => -17695.13224964217 > reduce
11/11/14 23:06:37 INFO mapred.TaskRunner: Task
'attempt_local_0004_r_000000_0' done.
11/11/14 23:06:38 INFO mapred.JobClient:  map 100% reduce 100%
11/11/14 23:06:38 INFO mapred.JobClient: Job complete: job_local_0004
11/11/14 23:06:38 INFO mapred.JobClient: Counters: 15
11/11/14 23:06:38 INFO mapred.JobClient:   FileSystemCounters
11/11/14 23:06:38 INFO mapred.JobClient:     FILE_BYTES_READ=20239757
11/11/14 23:06:38 INFO mapred.JobClient:     HDFS_BYTES_READ=8288210
11/11/14 23:06:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=20483324
11/11/14 23:06:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=6654856
11/11/14 23:06:38 INFO mapred.JobClient:   Map-Reduce Framework
11/11/14 23:06:38 INFO mapred.JobClient:     Reduce input groups=7
11/11/14 23:06:38 INFO mapred.JobClient:     Combine output records=7
11/11/14 23:06:38 INFO mapred.JobClient:     Map input records=19917
11/11/14 23:06:38 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/14 23:06:38 INFO mapred.JobClient:     Reduce output records=7
11/11/14 23:06:38 INFO mapred.JobClient:     Spilled Records=14
11/11/14 23:06:38 INFO mapred.JobClient:     Map output bytes=507100
11/11/14 23:06:38 INFO mapred.JobClient:     Map input bytes=830120
11/11/14 23:06:38 INFO mapred.JobClient:     Combine input records=19917
11/11/14 23:06:38 INFO mapred.JobClient:     Map output records=19917
11/11/14 23:06:38 INFO mapred.JobClient:     Reduce input records=7
11/11/14 23:06:38 INFO common.HadoopUtil: Deleting
hdfs://localhost:9000/user/cunnin/articles-model/trainer-docCount
11/11/14 23:06:38 INFO common.HadoopUtil: Deleting
hdfs://localhost:9000/user/cunnin/articles-model/trainer-termDocCount
11/11/14 23:06:38 INFO common.HadoopUtil: Deleting
hdfs://localhost:9000/user/cunnin/articles-model/trainer-featureCount
11/11/14 23:06:38 INFO common.HadoopUtil: Deleting
hdfs://localhost:9000/user/cunnin/articles-model/trainer-wordFreq
11/11/14 23:06:38 INFO common.HadoopUtil: Deleting
hdfs://localhost:9000/user/cunnin/articles-model/trainer-tfIdf/trainer-vocabCount

--
View this message in context: http://lucene.472066.n3.nabble.com/trainclassifier-as-a-command-vs-TrainClassifier-java-tp3508652p3508691.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Mime
View raw message