Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB2A41033C for ; Mon, 2 Dec 2013 03:57:45 +0000 (UTC) Received: (qmail 64623 invoked by uid 500); 2 Dec 2013 03:57:21 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 64540 invoked by uid 500); 2 Dec 2013 03:57:14 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 64532 invoked by uid 99); 2 Dec 2013 03:57:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 03:57:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of unmeshabiju@gmail.com designates 209.85.220.178 as permitted sender) Received: from [209.85.220.178] (HELO mail-vc0-f178.google.com) (209.85.220.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Dec 2013 03:57:07 +0000 Received: by mail-vc0-f178.google.com with SMTP id lh4so8262634vcb.9 for ; Sun, 01 Dec 2013 19:56:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=d2SCKr3jx/4os0xy9DDUabJsGIMPEaI2PcFrsmv+AVU=; b=lSGrXFriqGhKw0hLIhjBY512DLI22m49wy01JzM4Dcwd1rmc17tmHR73v1+ZLPIwZ9 sakBJSy67SAx7aFISa63sEKRBsMSsJFhyQKAHBd4VO5v4iF5Y3IeeguDNwXhnJutYce8 oN5SEUcpx3qgDhcFsLCv98dRbTChJxYT6lvxn7knb3TOuCjrII5WnUcgYmjmYK56M/DO SBtkxKA+CoqW1ICQRk7Z0H+XIZ2StJMcxAwJXV3FEpuqLmVpKFlFlp/0WXlAU8oa3JrX oEcCocvHgOHazN2HxdCTt/5KbG9cvf7RsVi3Xe62vXjztBu9pslSFizZqSJyglN/+R7b i6pg== MIME-Version: 1.0 X-Received: by 10.52.157.232 with SMTP id wp8mr42931736vdb.4.1385956606938; Sun, 01 Dec 2013 19:56:46 -0800 (PST) Received: by 10.59.8.2 with HTTP; Sun, 1 Dec 2013 19:56:46 -0800 (PST) In-Reply-To: References: Date: Mon, 2 Dec 2013 09:26:46 +0530 Message-ID: Subject: Re: Desicion Tree Implementation in Hadoop MapReduce From: unmesha sreeveni To: User Hadoop Content-Type: multipart/alternative; boundary=089e016339ee3b13b704ec852b48 X-Virus-Checked: Checked by ClamAV on apache.org --089e016339ee3b13b704ec852b48 Content-Type: text/plain; charset=ISO-8859-1 Thanks Yexi...A very nice explanation...Thanks a lot.. Explained in a very simple way which is really understandable for beginners..Thanks a lot. I can go for chaining jobs right? On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang wrote: > In my opinion. > > 1. Build the decision tree model with the training data. > 2. Store it somewhere. > 3. When the unlabeled data is available: > 3.1 if the unlabeled data is huge, write another mrjob to process them, > load the model at the setup stage, use the model to label the data one by > one in map stage. There is no necessary to have a reducer. > 3.2 if the unlabeled data is small, it is trivial. > > > > > 2013/12/1 unmesha sreeveni > >> Thanks Yexi , >> >> But how it can be accomplished. >> The input to Desicion Tree MR will be a set of data. But while >> predicting a data it will be a one line data without classlabel right? >> So what changes will be there in mrjob.Should we design like this. >> 1. When a set of data is coming draw Desicion tree >> 2. else if a one line data is coming.check the output of decision >> tree(Decision tree generated from mr) and predict the class label. >> >> ------- >> >> M1_train - dataset for training. >> M1_test - test data or prediction. >> 1. Will it be one data as input for prediction or set of data given >> as input at-once. >> 2.we also need to ensure in our pgm that M1_test belongs to M1_train >> only. we shld check that also ...right? if M1_test is given into >> M2_train it should show error. is nt 'it?. >> >> Pls suggest if my thoughts are wrong. >> >> On 11/30/13, Yexi Jiang wrote: >> > I watched the video in it but I cannot access its source code due to >> > permission issue. >> > In my opinion, once the decision tree model is built, the model is small >> > enough to be loaded into memory and can be used directly without another >> > mrjob for prediction. The prediction can be conducted in a streaming >> way. >> > >> > >> > 2013/11/30 unmesha sreeveni >> > >> >> I have gone through a Map Reduce implementation of c4.5 in >> >> >> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html >> >> >> >> Here a decision tree is build. So my doubt is >> >> Can we also include the prediction along with that? >> >> >> >> >> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang >> wrote: >> >> >> >>> You are welcome :) >> >>> >> >>> >> >>> 2013/11/25 unmesha sreeveni >> >>> >> >>>> ok . Thx Yexi >> >>>> >> >>>> >> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang >> >>>> wrote: >> >>>> >> >>>>> As far as I know, there is no ID3 implementation in mahout >> currently, >> >>>>> but you can use the decision forest instead. >> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example. >> >>>>> >> >>>>> >> >>>>> 2013/11/25 unmesha sreeveni >> >>>>> >> >>>>>> Is that ID3 classification? >> >>>>>> It includes prediction also? >> >>>>>> >> >>>>>> >> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang >> >>>>>> wrote: >> >>>>>> >> >>>>>>> You can directly find it at https://github.com/apache/mahout, or >> you >> >>>>>>> can check out from svn by following >> >>>>>>> >> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control. >> >>>>>>> >> >>>>>>> >> >>>>>>> 2013/11/23 unmesha sreeveni >> >>>>>>> >> >>>>>>>> I want to go through Decision tree implementation in mahout. >> >>>>>>>> Refereed Apache Mahout >> >>>>>>>> >> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released >> >>>>>>>> Apache Mahout has reached version 0.6. All developers are >> encouraged >> >>>>>>>> to begin using version 0.6. Highlights include: >> >>>>>>>> Improved Decision Tree performance and added support for >> regression >> >>>>>>>> problems >> >>>>>>>> >> >>>>>>>> Where can I find its source code and documentation. >> >>>>>>>> >> >>>>>>>> Should I download mahout >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> *Thanks & Regards* >> >>>>>>>> >> >>>>>>>> Unmesha Sreeveni U.B >> >>>>>>>> >> >>>>>>>> *Junior Developer* >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> ------ >> >>>>>>> Yexi Jiang, >> >>>>>>> ECS 251, yjian004@cs.fiu.edu >> >>>>>>> School of Computer and Information Science, >> >>>>>>> Florida International University >> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -- >> >>>>>> *Thanks & Regards* >> >>>>>> >> >>>>>> Unmesha Sreeveni U.B >> >>>>>> >> >>>>>> *Junior Developer* >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> ------ >> >>>>> Yexi Jiang, >> >>>>> ECS 251, yjian004@cs.fiu.edu >> >>>>> School of Computer and Information Science, >> >>>>> Florida International University >> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/ >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> *Thanks & Regards* >> >>>> >> >>>> Unmesha Sreeveni U.B >> >>>> >> >>>> *Junior Developer* >> >>>> >> >>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> ------ >> >>> Yexi Jiang, >> >>> ECS 251, yjian004@cs.fiu.edu >> >>> School of Computer and Information Science, >> >>> Florida International University >> >>> Homepage: http://users.cis.fiu.edu/~yjian004/ >> >>> >> >>> >> >> >> >> >> >> -- >> >> *Thanks & Regards* >> >> >> >> Unmesha Sreeveni U.B >> >> >> >> *Junior Developer* >> >> >> >> >> >> >> > >> > >> > -- >> > ------ >> > Yexi Jiang, >> > ECS 251, yjian004@cs.fiu.edu >> > School of Computer and Information Science, >> > Florida International University >> > Homepage: http://users.cis.fiu.edu/~yjian004/ >> > >> >> >> -- >> *Thanks & Regards* >> >> Unmesha Sreeveni U.B >> >> *Junior Developer* >> > > > > -- > ------ > Yexi Jiang, > ECS 251, yjian004@cs.fiu.edu > School of Computer and Information Science, > Florida International University > Homepage: http://users.cis.fiu.edu/~yjian004/ > > -- *Thanks & Regards* Unmesha Sreeveni U.B *Junior Developer* --089e016339ee3b13b704ec852b48 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks Yexi...A very nice explanation...Thanks a lot..
Explaine= d in a very simple way which is really understandable for beginners..Thanks= a lot.
I can= go for chaining jobs right?





On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <yexijiang@gmail.com<= /a>> wrote:
In my opinion.

1. Build the decision tree model with the training data.
2= . Store it somewhere.
3. When the unlabeled data is available:
=A0 =A03.1 if the u= nlabeled data is huge, write another mrjob to process them, load the model = at the setup stage, use the model to label the data one by one in map stage= . There is no necessary to have a reducer.
=A0 3.2 if the unlabeled data is small, it is trivial.

<= /div>



2013/12/1 unmesha sree= veni <unmeshabiju@gmail.com>
Thanks Yexi ,

But how =A0it can be accomplished.
The input to Desicion Tree MR will be a set of data. But while
predicting a data it will be a one line data without classlabel right?
So what changes will be there in mrjob.Should we design like this.
1. When a set of data is coming draw Desicion tree
2. else if a one line data is coming.check the output of decision
tree(Decision tree generated from mr) and predict the class label.

-------

M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or =A0set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Pls suggest if my thoughts are wrong.

On 11/30/13, Yexi Jiang <yexijiang@gmail.com> wrote:
> I watched the video in it but I cannot access its source code due to > permission issue.
> In my opinion, once the decision tree model is built, the model is sma= ll
> enough to be loaded into memory and can be used directly without anoth= er
> mrjob for prediction. The prediction can be conducted in a streaming w= ay.
>
>
> 2013/11/30 unmesha sreeveni <unmeshabiju@gmail.com>
>
>> I have gone through a Map Reduce implementation of c4.5 in
>> http://btechfreakz.blogspot.i= n/2013/04/implementation-of-c45-algorithm-using.html
>>
>> Here a decision tree is build. So my doubt is
>> Can we also include the prediction along with =A0that?
>>
>>
>> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <yexijiang@gmail.com> wrote: >>
>>> You are welcome :)
>>>
>>>
>>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>
>>>> ok . Thx Yexi
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com><= br> >>>> wrote:
>>>>
>>>>> As far as I know, there is no ID3 implementation in ma= hout currently,
>>>>> but you can use the decision forest instead.
>>>>> https://cwiki.apache.org/conflue= nce/display/MAHOUT/Breiman+Example.
>>>>>
>>>>>
>>>>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>>
>>>>>> Is that ID3 classification?
>>>>>> It includes prediction also?
>>>>>>
>>>>>>
>>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>>> <yexijiang@gmail.com>wrote:
>>>>>>
>>>>>>> You can directly find it at https://github.com/apache/maho= ut, or you
>>>>>>> can check out from svn by following
>>>>>>> https://cwiki.apache.org= /confluence/display/MAHOUT/Version+Control.
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/23 unmesha sreeveni <unmeshabiju@gmail.com><= br> >>>>>>>
>>>>>>>> =A0I want to go through Decision tree impl= ementation in mahout.
>>>>>>>> Refereed Apache Mahout <http://mahout.apache.o= rg/>
>>>>>>>>
>>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released >>>>>>>> Apache Mahout has reached version 0.6. All= developers are encouraged
>>>>>>>> to begin using version 0.6. Highlights inc= lude:
>>>>>>>> Improved Decision Tree performance and add= ed support for regression
>>>>>>>> problems
>>>>>>>>
>>>>>>>> Where can I find its source code and docum= entation.
>>>>>>>>
>>>>>>>> Should I download mahout
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251, =A0yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science, >>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251, =A0yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251, =A0yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251, =A0y= jian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>


--
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*



--
------
Ye= xi Jiang,
ECS 251,=A0 yjian004@cs.fiu.edu
School of Computer and Information Scienc= e,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/<= /a>




--
=
Thanks & Regards

Unmesha Sreeveni U.B
Junior Developer

--089e016339ee3b13b704ec852b48--