Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9AAE104BF for ; Mon, 2 Dec 2013 04:52:51 +0000 (UTC) Received: (qmail 36798 invoked by uid 500); 2 Dec 2013 04:52:44 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 36652 invoked by uid 500); 2 Dec 2013 04:52:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 36638 invoked by uid 99); 2 Dec 2013 04:52:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 04:52:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of unmeshabiju@gmail.com designates 209.85.212.54 as permitted sender) Received: from [209.85.212.54] (HELO mail-vb0-f54.google.com) (209.85.212.54) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 04:52:38 +0000 Received: by mail-vb0-f54.google.com with SMTP id p6so8247442vbe.13 for ; Sun, 01 Dec 2013 20:52:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=WvRiLTtDmIjWU+ez9JKOA7jbiAievERuMLZAR3jPRO8=; b=yLaCYQmPqKrbxu7hHbq+3kNKpa4nUCRD7ZxezaLWSDCtcktI15KAq381S3jq7xk98T kQf2Ss0M4lIphgCa2eVsGYGyUiOc9NF2ECJItIibdbRW3KgyQifJQJyXwR5gKCIfGoE2 gWoArDY92SLxpPiIYjTmNYUDvbO3FOpPmnztE4QUQ/CsGTkZUYtv9wicjEUkUHl+lcNj +vR8ti1iRmKCN9/WBezy25HJ8J4+VWvGhGhoPMRRhjAy2GlGZqnu4gt6ljEm4M3WlEp0 HMKfvikD/UCM+lSBH5f79CPoRFBj7QK4UPFD2effsvsN3f7XVr/KPLVE5r2frJL5w02Z kQSA== MIME-Version: 1.0 X-Received: by 10.52.230.102 with SMTP id sx6mr42907565vdc.15.1385959937441; Sun, 01 Dec 2013 20:52:17 -0800 (PST) Received: by 10.59.8.2 with HTTP; Sun, 1 Dec 2013 20:52:17 -0800 (PST) In-Reply-To: References: Date: Mon, 2 Dec 2013 10:22:17 +0530 Message-ID: Subject: Re: Desicion Tree Implementation in Hadoop MapReduce From: unmesha sreeveni To: User Hadoop Content-Type: multipart/alternative; boundary=089e0111ae34be848704ec85f10e X-Virus-Checked: Checked by ClamAV on apache.org --089e0111ae34be848704ec85f10e Content-Type: text/plain; charset=ISO-8859-1 1. I jst thought of building a model using a project named say DT and wen a huge input comes do another mr job test.java with in DT. If not chaining jobs we need to create seperate project right DT_build and DT_test projects NO need for seperate project file? 2. M1_train - dataset for training. M1_test - test data or prediction. 1. Will it be one data as input for prediction or set of data given as input at-once. 2.we also need to ensure in our pgm that M1_test belongs to M1_train only. we shld check that also ...right? if M1_test is given into M2_train it should show error. is nt 'it?. Any thing wrong in my inference... Are u able to guess wt i am trying to accomplish. I am confused if i need to create only 1 project that includes train and test.or 2 projects On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang wrote: > What is your motivation of using chaining jobs? > > > 2013/12/1 unmesha sreeveni > >> Thanks Yexi...A very nice explanation...Thanks a lot.. >> Explained in a very simple way which is really understandable for >> beginners..Thanks a lot. >> I can go for chaining jobs right? >> >> >> >> >> >> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang wrote: >> >>> In my opinion. >>> >>> 1. Build the decision tree model with the training data. >>> 2. Store it somewhere. >>> 3. When the unlabeled data is available: >>> 3.1 if the unlabeled data is huge, write another mrjob to process >>> them, load the model at the setup stage, use the model to label the data >>> one by one in map stage. There is no necessary to have a reducer. >>> 3.2 if the unlabeled data is small, it is trivial. >>> >>> >>> >>> >>> 2013/12/1 unmesha sreeveni >>> >>>> Thanks Yexi , >>>> >>>> But how it can be accomplished. >>>> The input to Desicion Tree MR will be a set of data. But while >>>> predicting a data it will be a one line data without classlabel right? >>>> So what changes will be there in mrjob.Should we design like this. >>>> 1. When a set of data is coming draw Desicion tree >>>> 2. else if a one line data is coming.check the output of decision >>>> tree(Decision tree generated from mr) and predict the class label. >>>> >>>> ------- >>>> >>>> M1_train - dataset for training. >>>> M1_test - test data or prediction. >>>> 1. Will it be one data as input for prediction or set of data given >>>> as input at-once. >>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train >>>> only. we shld check that also ...right? if M1_test is given into >>>> M2_train it should show error. is nt 'it?. >>>> >>>> Pls suggest if my thoughts are wrong. >>>> >>>> On 11/30/13, Yexi Jiang wrote: >>>> > I watched the video in it but I cannot access its source code due to >>>> > permission issue. >>>> > In my opinion, once the decision tree model is built, the model is >>>> small >>>> > enough to be loaded into memory and can be used directly without >>>> another >>>> > mrjob for prediction. The prediction can be conducted in a streaming >>>> way. >>>> > >>>> > >>>> > 2013/11/30 unmesha sreeveni >>>> > >>>> >> I have gone through a Map Reduce implementation of c4.5 in >>>> >> >>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html >>>> >> >>>> >> Here a decision tree is build. So my doubt is >>>> >> Can we also include the prediction along with that? >>>> >> >>>> >> >>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang >>>> wrote: >>>> >> >>>> >>> You are welcome :) >>>> >>> >>>> >>> >>>> >>> 2013/11/25 unmesha sreeveni >>>> >>> >>>> >>>> ok . Thx Yexi >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang >>>> >>>> wrote: >>>> >>>> >>>> >>>>> As far as I know, there is no ID3 implementation in mahout >>>> currently, >>>> >>>>> but you can use the decision forest instead. >>>> >>>>> >>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. >>>> >>>>> >>>> >>>>> >>>> >>>>> 2013/11/25 unmesha sreeveni >>>> >>>>> >>>> >>>>>> Is that ID3 classification? >>>> >>>>>> It includes prediction also? >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang >>>> >>>>>> wrote: >>>> >>>>>> >>>> >>>>>>> You can directly find it at https://github.com/apache/mahout, >>>> or you >>>> >>>>>>> can check out from svn by following >>>> >>>>>>> >>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control. >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> 2013/11/23 unmesha sreeveni >>>> >>>>>>> >>>> >>>>>>>> I want to go through Decision tree implementation in mahout. >>>> >>>>>>>> Refereed Apache Mahout >>>> >>>>>>>> >>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released >>>> >>>>>>>> Apache Mahout has reached version 0.6. All developers are >>>> encouraged >>>> >>>>>>>> to begin using version 0.6. Highlights include: >>>> >>>>>>>> Improved Decision Tree performance and added support for >>>> regression >>>> >>>>>>>> problems >>>> >>>>>>>> >>>> >>>>>>>> Where can I find its source code and documentation. >>>> >>>>>>>> >>>> >>>>>>>> Should I download mahout >>>> >>>>>>>> >>>> >>>>>>>> -- >>>> >>>>>>>> *Thanks & Regards* >>>> >>>>>>>> >>>> >>>>>>>> Unmesha Sreeveni U.B >>>> >>>>>>>> >>>> >>>>>>>> *Junior Developer* >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> -- >>>> >>>>>>> ------ >>>> >>>>>>> Yexi Jiang, >>>> >>>>>>> ECS 251, yjian004@cs.fiu.edu >>>> >>>>>>> School of Computer and Information Science, >>>> >>>>>>> Florida International University >>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> -- >>>> >>>>>> *Thanks & Regards* >>>> >>>>>> >>>> >>>>>> Unmesha Sreeveni U.B >>>> >>>>>> >>>> >>>>>> *Junior Developer* >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> -- >>>> >>>>> ------ >>>> >>>>> Yexi Jiang, >>>> >>>>> ECS 251, yjian004@cs.fiu.edu >>>> >>>>> School of Computer and Information Science, >>>> >>>>> Florida International University >>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>> >>>>> >>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *Thanks & Regards* >>>> >>>> >>>> >>>> Unmesha Sreeveni U.B >>>> >>>> >>>> >>>> *Junior Developer* >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> ------ >>>> >>> Yexi Jiang, >>>> >>> ECS 251, yjian004@cs.fiu.edu >>>> >>> School of Computer and Information Science, >>>> >>> Florida International University >>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>>> >>> >>>> >>> >>>> >> >>>> >> >>>> >> -- >>>> >> *Thanks & Regards* >>>> >> >>>> >> Unmesha Sreeveni U.B >>>> >> >>>> >> *Junior Developer* >>>> >> >>>> >> >>>> >> >>>> > >>>> > >>>> > -- >>>> > ------ >>>> > Yexi Jiang, >>>> > ECS 251, yjian004@cs.fiu.edu >>>> > School of Computer and Information Science, >>>> > Florida International University >>>> > Homepage: http://users.cis.fiu.edu/~yjian004/ >>>> > >>>> >>>> >>>> -- >>>> *Thanks & Regards* >>>> >>>> Unmesha Sreeveni U.B >>>> >>>> *Junior Developer* >>>> >>> >>> >>> >>> -- >>> ------ >>> Yexi Jiang, >>> ECS 251, yjian004@cs.fiu.edu >>> School of Computer and Information Science, >>> Florida International University >>> Homepage: http://users.cis.fiu.edu/~yjian004/ >>> >>> >> >> >> -- >> *Thanks & Regards* >> >> Unmesha Sreeveni U.B >> >> *Junior Developer* >> >> >> > > > -- > ------ > Yexi Jiang, > ECS 251, yjian004@cs.fiu.edu > School of Computer and Information Science, > Florida International University > Homepage: http://users.cis.fiu.edu/~yjian004/ > > -- *Thanks & Regards* Unmesha Sreeveni U.B *Junior Developer* --089e0111ae34be848704ec85f10e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
1. I jst thought of building a model using a project named say = DT and wen a huge input comes do another mr job test.java with in DT.
=
If not chaining jobs we need to create seperate project right DT_build and = DT_test projects
NO need for seperate project file?

2. M1_trai= n - dataset for training.
M1_test - test = data or prediction.
1. Wil= l it be one data as input for prediction or =A0set of data given
as input at-onc= e.
2.we also need to ensur= e in our pgm that M1_test belongs to M1_train
only. we shld c= heck that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Any thing wrong in my inference...
Are u able to guess wt i am trying to accomplish.
I am confused if i need t= o create only 1 project that includes train and test.or 2 projects


On Mon, Dec 2= , 2013 at 9:54 AM, Yexi Jiang <yexijiang@gmail.com> wrote:=
What is your motivation of using chaining jobs?


2013/12/1 unmesha sreeveni <unmeshabiju@gmail.co= m>
Thanks Yexi...A very nice explan= ation...Thanks a lot..
Expla= ined in a very simple way which is really understandable for beginners..Tha= nks a lot.
I can= go for chaining jobs right?





On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <yexijiang@= gmail.com> wrote:
In my opinion.

1. Build the decision tree model with the training data.
2= . Store it somewhere.
3. When the unlabeled data is available:
=A0 =A03.1 if the u= nlabeled data is huge, write another mrjob to process them, load the model = at the setup stage, use the model to label the data one by one in map stage= . There is no necessary to have a reducer.
=A0 3.2 if the unlabeled data is small, it is trivial.

<= /div>



2013/12/1 unmesha sreeveni <unmeshabiju@gmail.c= om>
Thanks Yexi ,

But how =A0it can be accomplished.
The input to Desicion Tree MR will be a set of data. But while
predicting a data it will be a one line data without classlabel right?
So what changes will be there in mrjob.Should we design like this.
1. When a set of data is coming draw Desicion tree
2. else if a one line data is coming.check the output of decision
tree(Decision tree generated from mr) and predict the class label.

-------

M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or =A0set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Pls suggest if my thoughts are wrong.

On 11/30/13, Yexi Jiang <yexijiang@gmail.com> wrote:
> I watched the video in it but I cannot access its source code due to > permission issue.
> In my opinion, once the decision tree model is built, the model is sma= ll
> enough to be loaded into memory and can be used directly without anoth= er
> mrjob for prediction. The prediction can be conducted in a streaming w= ay.
>
>
> 2013/11/30 unmesha sreeveni <unmeshabiju@gmail.com>
>
>> I have gone through a Map Reduce implementation of c4.5 in
>> http://btechfreakz.blogspot.i= n/2013/04/implementation-of-c45-algorithm-using.html
>>
>> Here a decision tree is build. So my doubt is
>> Can we also include the prediction along with =A0that?
>>
>>
>> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <yexijiang@gmail.com> wrote: >>
>>> You are welcome :)
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>
>>>> ok . Thx Yexi
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com><= br> >>>> wrote:
>>>>
>>>>> As far as I know, there is no ID3 implementation in ma= hout currently,
>>>>> but you can use the decision forest instead.
>>>>> https://cwiki.apache.org/conflue= nce/display/MAHOUT/Breiman+Example.
>>>>>
>>>>>
>>>>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>>
>>>>>> Is that ID3 classification?
>>>>>> It includes prediction also?
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> <yexijiang@gmail.com>wrote:
>>>>>>
>>>>>>> You can directly find it at https://github.com/apache/maho= ut, or you
>>>>>>> can check out from svn by following
>>>>>>> https://cwiki.apache.org= /confluence/display/MAHOUT/Version+Control.
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/23 unmesha sreeveni <unmeshabiju@gmail.com><= br> >>>>>>>
>>>>>>>> =A0I want to go through Decision tree impl= ementation in mahout.
>>>>>>>> Refereed Apache Mahout <http://mahout.apache.o= rg/>
>>>>>>>>
>>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released >>>>>>>> Apache Mahout has reached version 0.6. All= developers are encouraged
>>>>>>>> to begin using version 0.6. Highlights inc= lude:
>>>>>>>> Improved Decision Tree performance and add= ed support for regression
>>>>>>>> problems
>>>>>>>>
>>>>>>>> Where can I find its source code and docum= entation.
>>>>>>>>
>>>>>>>> Should I download mahout
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251, =A0yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science, >>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251, =A0yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251, =A0yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251, =A0y= jian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>


--
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*



--
------
Ye= xi Jiang,
ECS 251,=A0 yjian004@cs.fiu.edu
School of Computer and Information Scienc= e,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/<= /a>




<= /div>
--
Thanks &a= mp; Regards

Unmesha Sreeveni U.B
Junior Developer




--
------
Ye= xi Jiang,
ECS 251,=A0
yjian004@cs.fiu.edu
School of Computer and Information Scienc= e,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/<= /a>




--
=
Thanks & Regards

Unmesha Sreeveni U.B
Junior Developer

--089e0111ae34be848704ec85f10e--