hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yexi Jiang <yexiji...@gmail.com>
Subject Re: Desicion Tree Implementation in Hadoop MapReduce
Date Mon, 02 Dec 2013 04:59:43 GMT
Actually the training and testing (or prediction) are not necessary to be
done in one shot. If you need to do them consecutively in your particular
scenario, you can do it as what you said.

To make it more general, it's better to separate them. Since there might be
multiple batches of training (or to-be-label), and you only need to train
the model once (if your data is stable).


2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>

> 1. I jst thought of building a model using a project named say DT and wen
> a huge input comes do another mr job test.java with in DT.
> If not chaining jobs we need to create seperate project right DT_build and
> DT_test projects
> NO need for seperate project file?
>
> 2. M1_train - dataset for training.
>
> M1_test - test data or prediction.
> 1. Will it be one data as input for prediction or  set of data given
> as input at-once.
> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
> only. we shld check that also ...right? if M1_test is given into
> M2_train it should show error. is nt 'it?.
>
> Any thing wrong in my inference...
> Are u able to guess wt i am trying to accomplish.
> I am confused if i need to create only 1 project that includes train and
> test.or 2 projects
>
>
> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <yexijiang@gmail.com> wrote:
>
>> What is your motivation of using chaining jobs?
>>
>>
>> 2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>
>>
>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>> Explained in a very simple way which is really understandable for
>>> beginners..Thanks a lot.
>>> I can go for chaining jobs right?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <yexijiang@gmail.com> wrote:
>>>
>>>> In my opinion.
>>>>
>>>> 1. Build the decision tree model with the training data.
>>>> 2. Store it somewhere.
>>>> 3. When the unlabeled data is available:
>>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>>> them, load the model at the setup stage, use the model to label the data
>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>
>>>>
>>>>
>>>>
>>>> 2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>
>>>>> Thanks Yexi ,
>>>>>
>>>>> But how  it can be accomplished.
>>>>> The input to Desicion Tree MR will be a set of data. But while
>>>>> predicting a data it will be a one line data without classlabel right?
>>>>> So what changes will be there in mrjob.Should we design like this.
>>>>> 1. When a set of data is coming draw Desicion tree
>>>>> 2. else if a one line data is coming.check the output of decision
>>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>>
>>>>> -------
>>>>>
>>>>> M1_train - dataset for training.
>>>>> M1_test - test data or prediction.
>>>>> 1. Will it be one data as input for prediction or  set of data given
>>>>> as input at-once.
>>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>>> only. we shld check that also ...right? if M1_test is given into
>>>>> M2_train it should show error. is nt 'it?.
>>>>>
>>>>> Pls suggest if my thoughts are wrong.
>>>>>
>>>>> On 11/30/13, Yexi Jiang <yexijiang@gmail.com> wrote:
>>>>> > I watched the video in it but I cannot access its source code due
to
>>>>> > permission issue.
>>>>> > In my opinion, once the decision tree model is built, the model
is
>>>>> small
>>>>> > enough to be loaded into memory and can be used directly without
>>>>> another
>>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>>> way.
>>>>> >
>>>>> >
>>>>> > 2013/11/30 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>> >
>>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>>> >>
>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>> >>
>>>>> >> Here a decision tree is build. So my doubt is
>>>>> >> Can we also include the prediction along with  that?
>>>>> >>
>>>>> >>
>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <yexijiang@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >>> You are welcome :)
>>>>> >>>
>>>>> >>>
>>>>> >>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>> >>>
>>>>> >>>> ok . Thx Yexi
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com>
>>>>> >>>> wrote:
>>>>> >>>>
>>>>> >>>>> As far as I know, there is no ID3 implementation
in mahout
>>>>> currently,
>>>>> >>>>> but you can use the decision forest instead.
>>>>> >>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>> >>>>>
>>>>> >>>>>> Is that ID3 classification?
>>>>> >>>>>> It includes prediction also?
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>>> >>>>>> <yexijiang@gmail.com>wrote:
>>>>> >>>>>>
>>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>>> or you
>>>>> >>>>>>> can check out from svn by following
>>>>> >>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> 2013/11/23 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>> >>>>>>>
>>>>> >>>>>>>>  I want to go through Decision tree
implementation in mahout.
>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>> >>>>>>>>
>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>>> >>>>>>>> Apache Mahout has reached version 0.6.
All developers are
>>>>> encouraged
>>>>> >>>>>>>> to begin using version 0.6. Highlights
include:
>>>>> >>>>>>>> Improved Decision Tree performance and
added support for
>>>>> regression
>>>>> >>>>>>>> problems
>>>>> >>>>>>>>
>>>>> >>>>>>>> Where can I find its source code and
documentation.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Should I download mahout
>>>>> >>>>>>>>
>>>>> >>>>>>>> --
>>>>> >>>>>>>> *Thanks & Regards*
>>>>> >>>>>>>>
>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>>>
>>>>> >>>>>>>> *Junior Developer*
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> --
>>>>> >>>>>>> ------
>>>>> >>>>>>> Yexi Jiang,
>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>>>> School of Computer and Information Science,
>>>>> >>>>>>> Florida International University
>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> --
>>>>> >>>>>> *Thanks & Regards*
>>>>> >>>>>>
>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>> >>>>>>
>>>>> >>>>>> *Junior Developer*
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> ------
>>>>> >>>>> Yexi Jiang,
>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>>>> School of Computer and Information Science,
>>>>> >>>>> Florida International University
>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> --
>>>>> >>>> *Thanks & Regards*
>>>>> >>>>
>>>>> >>>> Unmesha Sreeveni U.B
>>>>> >>>>
>>>>> >>>> *Junior Developer*
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> ------
>>>>> >>> Yexi Jiang,
>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>> >>> School of Computer and Information Science,
>>>>> >>> Florida International University
>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> *Thanks & Regards*
>>>>> >>
>>>>> >> Unmesha Sreeveni U.B
>>>>> >>
>>>>> >> *Junior Developer*
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > ------
>>>>> > Yexi Jiang,
>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>> > School of Computer and Information Science,
>>>>> > Florida International University
>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>> *Thanks & Regards*
>>>>>
>>>>> Unmesha Sreeveni U.B
>>>>>
>>>>> *Junior Developer*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>
>>>>
>>>
>>>
>>> --
>>> *Thanks & Regards*
>>>
>>> Unmesha Sreeveni U.B
>>>
>>> *Junior Developer*
>>>
>>>
>>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Mime
View raw message