mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjitha Chandrashekar <Ranjitha...@hcl.com>
Subject RE: Issue with Partial Implementation Problem
Date Fri, 18 Jan 2013 09:29:56 GMT
Hi Deneche,

Thanks. As suggested, I replaced the label value as "normal" in KDDTest dataset and tested
the forest without -a option.
It generates a binary file(.out file) with values 0 and 1.

In order to interpret this I have gone through the code and hence understand that MR job (Classifier.CMapper)
generates a file with Key -> Correct Label and Value -> Prediction. Then it creates
a new file with .out extension which only contains Values i.e. Prediction(0 or 1) in my case
and then it deletes the previous file generated by the MR job. Hence I do not have access
to the file generated by MR job which contains Correct Label and Prediction for each input
Test record

After looking at these predictions I am not sure what 0 and 1 actually means . Does 1 mean
its classified correctly..? "normal" in this case and 0 means the classification is wrong
and should be "anamoly"?

Please Suggest

Regards
Ranjitha

-----Original Message-----
From: deneche abdelhakim [mailto:adeneche@gmail.com] 
Sent: 18 January 2013 12:21
To: user@mahout.apache.org
Subject: Re: Issue with Partial Implementation Problem

My mistake. You should put any label value available in the training set.
In the previous example, putting "normal" in all test record should be fine.


On Fri, Jan 18, 2013 at 7:26 AM, Ranjitha Chandrashekar <Ranjitha.Ch@hcl.com
> wrote:

> Hi Deneche
>
> Thank you for your quick response.
>
> I tried using the numerical value in the label attribute in the test data.
>
> Original Record in KDDTest :
> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,normal
>
> Replaced Record :
>
> 13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,1
>
> (normal class replaced with numerical value 1)
>
> Ran TestForest on KDDTest dataset. Following is the error that i get.
> Sequential and map reduce classification gives the same error.
>
> Command --> hadoop jar
> /usr/lib/mahout-0.5/mahout-examples-0.5-cdh3u5-job.jar
> org.apache.mahout.df.mapreduce.TestForest -i
> /user/ranjitha/input/KDDTest+.arff.txt_withnum -ds
> /user/ranjitha/input/KDDTrain+.info -m /user/ranjitha/KDDForest -o
> /user/ranjitha/KDDResult
>
> 13/01/18 11:29:24 INFO mapreduce.TestForest: Loading the forest...
> 13/01/18 11:29:24 INFO mapreduce.TestForest: Sequential classification...
> 13/01/18 11:29:24 ERROR data.DataConverter: label token: 1 dataset.labels:
> [normal, anomaly] Exception in thread "main"
> java.lang.IllegalStateException: Label value (1) not known
>         at
> org.apache.mahout.df.data.DataConverter.convert(DataConverter.java:71)
>         at
> org.apache.mahout.df.mapreduce.TestForest.testFile(TestForest.java:256)
>         at
> org.apache.mahout.df.mapreduce.TestForest.sequential(TestForest.java:216)
>         at
> org.apache.mahout.df.mapreduce.TestForest.testForest(TestForest.java:172)
>         at
> org.apache.mahout.df.mapreduce.TestForest.run(TestForest.java:142)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> org.apache.mahout.df.mapreduce.TestForest.main(TestForest.java:275)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:616)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> Looking forward to your reply
>
> Thanks
> Ranjitha.
>
> -----Original Message-----
> From: deneche abdelhakim [mailto:adeneche@gmail.com]
> Sent: 17 January 2013 18:20
> To: user@mahout.apache.org
> Subject: Re: Issue with Partial Implementation Problem
>
> Hi Ranjitha,
>
> just put any numerical value in the label attribute. You should be able to
> classify the data, but you won't be able to compute the confusion matrix or
> the accuracy.
>
>
> On Thu, Jan 17, 2013 at 12:15 PM, Ranjitha Chandrashekar <
> Ranjitha.Ch@hcl.com> wrote:
>
> > Hi
> >
> > I am using Partial Implementation for Random Forest classification.
> >
> > I have a training dataset with labels class0, class 1, class 2.  The
> > decision forest is built on this training dataset.  The classification
> for
> > the test dataset is computed using the same data descriptor generated for
> > the training dataset.  I am able to generate confusion matrix, accuracy
> > details with the test data set with class variable.
> >
> > However I also need to make a classification for a scenario, where test
> > data may not have the class variable or class values are not known.  For
> > ex, assume test data is about future data points, for which class values
> > will have to be computed only in the future.
> >
> >
> > *         How is it possible to classify the test data set, where the
> > class label is not defined or not known. I have tried using default
> labels
> > like "unknown", "NO_LABEL". It doesnt seem to work.
> >
> >
> > *         How to set the class label as "unknown" in the testing dataset.
> >
> > Looking forward to your reply,
> >
> > Thanks
> > Ranjitha.
> >
> >
> >
> > ::DISCLAIMER::
> >
> >
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> > intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as
> > information could be intercepted, corrupted,
> > lost, destroyed, arrive late or incomplete, or may contain viruses in
> > transmission. The e mail and its contents
> > (with or without referred errors) shall therefore not attach any
> liability
> > on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those of
> the
> > author and may not necessarily reflect the
> > views or opinions of HCL or its affiliates. Any form of reproduction,
> > dissemination, copying, disclosure, modification,
> > distribution and / or publication of this message without the prior
> > written consent of authorized representative of
> > HCL is strictly prohibited. If you have received this email in error
> > please delete it and notify the sender immediately.
> > Before opening any email and/or attachments, please check them for
> viruses
> > and other defects.
> >
> >
> >
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> >
>

Mime
View raw message