hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unmesha sreeveni <unmeshab...@gmail.com>
Subject Re: Desicion Tree Implementation in Hadoop MapReduce
Date Tue, 03 Dec 2013 03:18:33 GMT
Thank you Yexi...Thanks for spending your valuable time.


On Mon, Dec 2, 2013 at 8:22 PM, Yexi Jiang <yexijiang@gmail.com> wrote:

> Yes, the user is responsible for using the correct model for a given piece
> of testing (or unlabeled) data.
>
>
> 2013/12/2 unmesha sreeveni <unmeshabiju@gmail.com>
>
>> To make it more general, it's better to separate them. Since there might
>> be multiple batches of training (or to-be-label), and you only need to
>> train the model once (if your data is stable).
>>
>> Ok , I will go for the second one.
>> So if we are going for separate.They will not have any connection with
>> both. So we should tell what test data belongs to which train data.
>> And load the corresponding playtennnis_tree.txt (so the result file
>> should be named in a manner that the training result name can be noticed by
>> its file name) for the train data and predict the test data.
>>
>>
>> On Mon, Dec 2, 2013 at 10:29 AM, Yexi Jiang <yexijiang@gmail.com> wrote:
>>
>>> Actually the training and testing (or prediction) are not necessary to
>>> be done in one shot. If you need to do them consecutively in your
>>> particular scenario, you can do it as what you said.
>>>
>>> To make it more general, it's better to separate them. Since there might
>>> be multiple batches of training (or to-be-label), and you only need to
>>> train the model once (if your data is stable).
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>
>>>
>>>> 1. I jst thought of building a model using a project named say DT and
>>>> wen a huge input comes do another mr job test.java with in DT.
>>>> If not chaining jobs we need to create seperate project right DT_build
>>>> and DT_test projects
>>>> NO need for seperate project file?
>>>>
>>>> 2. M1_train - dataset for training.
>>>>
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Any thing wrong in my inference...
>>>> Are u able to guess wt i am trying to accomplish.
>>>> I am confused if i need to create only 1 project that includes train
>>>> and test.or 2 projects
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <yexijiang@gmail.com> wrote:
>>>>
>>>>> What is your motivation of using chaining jobs?
>>>>>
>>>>>
>>>>> 2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>>
>>>>>> Thanks Yexi...A very nice explanation...Thanks a lot..
>>>>>> Explained in a very simple way which is really understandable for
>>>>>> beginners..Thanks a lot.
>>>>>> I can go for chaining jobs right?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <yexijiang@gmail.com>wrote:
>>>>>>
>>>>>>> In my opinion.
>>>>>>>
>>>>>>> 1. Build the decision tree model with the training data.
>>>>>>> 2. Store it somewhere.
>>>>>>> 3. When the unlabeled data is available:
>>>>>>>    3.1 if the unlabeled data is huge, write another mrjob to
process
>>>>>>> them, load the model at the setup stage, use the model to label
the data
>>>>>>> one by one in map stage. There is no necessary to have a reducer.
>>>>>>>   3.2 if the unlabeled data is small, it is trivial.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>>>>
>>>>>>>> Thanks Yexi ,
>>>>>>>>
>>>>>>>> But how  it can be accomplished.
>>>>>>>> The input to Desicion Tree MR will be a set of data. But
while
>>>>>>>> predicting a data it will be a one line data without classlabel
>>>>>>>> right?
>>>>>>>> So what changes will be there in mrjob.Should we design like
this.
>>>>>>>> 1. When a set of data is coming draw Desicion tree
>>>>>>>> 2. else if a one line data is coming.check the output of
decision
>>>>>>>> tree(Decision tree generated from mr) and predict the class
label.
>>>>>>>>
>>>>>>>> -------
>>>>>>>>
>>>>>>>> M1_train - dataset for training.
>>>>>>>> M1_test - test data or prediction.
>>>>>>>> 1. Will it be one data as input for prediction or  set of
data given
>>>>>>>> as input at-once.
>>>>>>>> 2.we also need to ensure in our pgm that M1_test belongs
to M1_train
>>>>>>>> only. we shld check that also ...right? if M1_test is given
into
>>>>>>>> M2_train it should show error. is nt 'it?.
>>>>>>>>
>>>>>>>> Pls suggest if my thoughts are wrong.
>>>>>>>>
>>>>>>>> On 11/30/13, Yexi Jiang <yexijiang@gmail.com> wrote:
>>>>>>>> > I watched the video in it but I cannot access its source
code due
>>>>>>>> to
>>>>>>>> > permission issue.
>>>>>>>> > In my opinion, once the decision tree model is built,
the model
>>>>>>>> is small
>>>>>>>> > enough to be loaded into memory and can be used directly
without
>>>>>>>> another
>>>>>>>> > mrjob for prediction. The prediction can be conducted
in a
>>>>>>>> streaming way.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/11/30 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>>>>> >
>>>>>>>> >> I have gone through a Map Reduce implementation
of c4.5 in
>>>>>>>> >>
>>>>>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>>>>>> >>
>>>>>>>> >> Here a decision tree is build. So my doubt is
>>>>>>>> >> Can we also include the prediction along with  that?
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <yexijiang@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >>> You are welcome :)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>>>>> >>>
>>>>>>>> >>>> ok . Thx Yexi
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang
<
>>>>>>>> yexijiang@gmail.com>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> As far as I know, there is no ID3 implementation
in mahout
>>>>>>>> currently,
>>>>>>>> >>>>> but you can use the decision forest
instead.
>>>>>>>> >>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>>>>>> >>>>>
>>>>>>>> >>>>>> Is that ID3 classification?
>>>>>>>> >>>>>> It includes prediction also?
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM,
Yexi Jiang
>>>>>>>> >>>>>> <yexijiang@gmail.com>wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>> You can directly find it at
>>>>>>>> https://github.com/apache/mahout, or you
>>>>>>>> >>>>>>> can check out from svn by following
>>>>>>>> >>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> 2013/11/23 unmesha sreeveni
<unmeshabiju@gmail.com>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>>  I want to go through Decision
tree implementation in
>>>>>>>> mahout.
>>>>>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout
0.6 released
>>>>>>>> >>>>>>>> Apache Mahout has reached
version 0.6. All developers are
>>>>>>>> encouraged
>>>>>>>> >>>>>>>> to begin using version 0.6.
Highlights include:
>>>>>>>> >>>>>>>> Improved Decision Tree performance
and added support for
>>>>>>>> regression
>>>>>>>> >>>>>>>> problems
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Where can I find its source
code and documentation.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Should I download mahout
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> --
>>>>>>>> >>>>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> *Junior Developer*
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> --
>>>>>>>> >>>>>>> ------
>>>>>>>> >>>>>>> Yexi Jiang,
>>>>>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>>>> School of Computer and Information
Science,
>>>>>>>> >>>>>>> Florida International University
>>>>>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> --
>>>>>>>> >>>>>> *Thanks & Regards*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> *Junior Developer*
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> --
>>>>>>>> >>>>> ------
>>>>>>>> >>>>> Yexi Jiang,
>>>>>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>>>> School of Computer and Information Science,
>>>>>>>> >>>>> Florida International University
>>>>>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> --
>>>>>>>> >>>> *Thanks & Regards*
>>>>>>>> >>>>
>>>>>>>> >>>> Unmesha Sreeveni U.B
>>>>>>>> >>>>
>>>>>>>> >>>> *Junior Developer*
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> ------
>>>>>>>> >>> Yexi Jiang,
>>>>>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> >>> School of Computer and Information Science,
>>>>>>>> >>> Florida International University
>>>>>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> *Thanks & Regards*
>>>>>>>> >>
>>>>>>>> >> Unmesha Sreeveni U.B
>>>>>>>> >>
>>>>>>>> >> *Junior Developer*
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > ------
>>>>>>>> > Yexi Jiang,
>>>>>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>>>>>> > School of Computer and Information Science,
>>>>>>>> > Florida International University
>>>>>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Thanks & Regards*
>>>>>>>>
>>>>>>>> Unmesha Sreeveni U.B
>>>>>>>>
>>>>>>>> *Junior Developer*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ------
>>>>>>> Yexi Jiang,
>>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>>> School of Computer and Information Science,
>>>>>>> Florida International University
>>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Thanks & Regards*
>>>>>>
>>>>>> Unmesha Sreeveni U.B
>>>>>>
>>>>>> *Junior Developer*
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Mime
View raw message