hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unmesha sreeveni <unmeshab...@gmail.com>
Subject Re: Desicion Tree Implementation in Hadoop MapReduce
Date Mon, 02 Dec 2013 04:52:17 GMT
1. I jst thought of building a model using a project named say DT and wen a
huge input comes do another mr job test.java with in DT.
If not chaining jobs we need to create seperate project right DT_build and
DT_test projects
NO need for seperate project file?

2. M1_train - dataset for training.
M1_test - test data or prediction.
1. Will it be one data as input for prediction or  set of data given
as input at-once.
2.we also need to ensure in our pgm that M1_test belongs to M1_train
only. we shld check that also ...right? if M1_test is given into
M2_train it should show error. is nt 'it?.

Any thing wrong in my inference...
Are u able to guess wt i am trying to accomplish.
I am confused if i need to create only 1 project that includes train and
test.or 2 projects


On Mon, Dec 2, 2013 at 9:54 AM, Yexi Jiang <yexijiang@gmail.com> wrote:

> What is your motivation of using chaining jobs?
>
>
> 2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>
>
>> Thanks Yexi...A very nice explanation...Thanks a lot..
>> Explained in a very simple way which is really understandable for
>> beginners..Thanks a lot.
>> I can go for chaining jobs right?
>>
>>
>>
>>
>>
>> On Sun, Dec 1, 2013 at 8:55 PM, Yexi Jiang <yexijiang@gmail.com> wrote:
>>
>>> In my opinion.
>>>
>>> 1. Build the decision tree model with the training data.
>>> 2. Store it somewhere.
>>> 3. When the unlabeled data is available:
>>>    3.1 if the unlabeled data is huge, write another mrjob to process
>>> them, load the model at the setup stage, use the model to label the data
>>> one by one in map stage. There is no necessary to have a reducer.
>>>   3.2 if the unlabeled data is small, it is trivial.
>>>
>>>
>>>
>>>
>>> 2013/12/1 unmesha sreeveni <unmeshabiju@gmail.com>
>>>
>>>> Thanks Yexi ,
>>>>
>>>> But how  it can be accomplished.
>>>> The input to Desicion Tree MR will be a set of data. But while
>>>> predicting a data it will be a one line data without classlabel right?
>>>> So what changes will be there in mrjob.Should we design like this.
>>>> 1. When a set of data is coming draw Desicion tree
>>>> 2. else if a one line data is coming.check the output of decision
>>>> tree(Decision tree generated from mr) and predict the class label.
>>>>
>>>> -------
>>>>
>>>> M1_train - dataset for training.
>>>> M1_test - test data or prediction.
>>>> 1. Will it be one data as input for prediction or  set of data given
>>>> as input at-once.
>>>> 2.we also need to ensure in our pgm that M1_test belongs to M1_train
>>>> only. we shld check that also ...right? if M1_test is given into
>>>> M2_train it should show error. is nt 'it?.
>>>>
>>>> Pls suggest if my thoughts are wrong.
>>>>
>>>> On 11/30/13, Yexi Jiang <yexijiang@gmail.com> wrote:
>>>> > I watched the video in it but I cannot access its source code due to
>>>> > permission issue.
>>>> > In my opinion, once the decision tree model is built, the model is
>>>> small
>>>> > enough to be loaded into memory and can be used directly without
>>>> another
>>>> > mrjob for prediction. The prediction can be conducted in a streaming
>>>> way.
>>>> >
>>>> >
>>>> > 2013/11/30 unmesha sreeveni <unmeshabiju@gmail.com>
>>>> >
>>>> >> I have gone through a Map Reduce implementation of c4.5 in
>>>> >>
>>>> http://btechfreakz.blogspot.in/2013/04/implementation-of-c45-algorithm-using.html
>>>> >>
>>>> >> Here a decision tree is build. So my doubt is
>>>> >> Can we also include the prediction along with  that?
>>>> >>
>>>> >>
>>>> >> On Tue, Nov 26, 2013 at 8:52 AM, Yexi Jiang <yexijiang@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> You are welcome :)
>>>> >>>
>>>> >>>
>>>> >>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>> >>>
>>>> >>>> ok . Thx Yexi
>>>> >>>>
>>>> >>>>
>>>> >>>> On Tue, Nov 26, 2013 at 1:41 AM, Yexi Jiang <yexijiang@gmail.com>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> As far as I know, there is no ID3 implementation in
mahout
>>>> currently,
>>>> >>>>> but you can use the decision forest instead.
>>>> >>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Breiman+Example.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> 2013/11/25 unmesha sreeveni <unmeshabiju@gmail.com>
>>>> >>>>>
>>>> >>>>>> Is that ID3 classification?
>>>> >>>>>> It includes prediction also?
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On Sat, Nov 23, 2013 at 9:01 PM, Yexi Jiang
>>>> >>>>>> <yexijiang@gmail.com>wrote:
>>>> >>>>>>
>>>> >>>>>>> You can directly find it at https://github.com/apache/mahout,
>>>> or you
>>>> >>>>>>> can check out from svn by following
>>>> >>>>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> 2013/11/23 unmesha sreeveni <unmeshabiju@gmail.com>
>>>> >>>>>>>
>>>> >>>>>>>>  I want to go through Decision tree implementation
in mahout.
>>>> >>>>>>>> Refereed Apache Mahout <http://mahout.apache.org/>
>>>> >>>>>>>>
>>>> >>>>>>>> 6 Feb 2012 - Apache Mahout 0.6 released
>>>> >>>>>>>> Apache Mahout has reached version 0.6. All
developers are
>>>> encouraged
>>>> >>>>>>>> to begin using version 0.6. Highlights include:
>>>> >>>>>>>> Improved Decision Tree performance and added
support for
>>>> regression
>>>> >>>>>>>> problems
>>>> >>>>>>>>
>>>> >>>>>>>> Where can I find its source code and documentation.
>>>> >>>>>>>>
>>>> >>>>>>>> Should I download mahout
>>>> >>>>>>>>
>>>> >>>>>>>> --
>>>> >>>>>>>> *Thanks & Regards*
>>>> >>>>>>>>
>>>> >>>>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>>>
>>>> >>>>>>>> *Junior Developer*
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> ------
>>>> >>>>>>> Yexi Jiang,
>>>> >>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>>>> School of Computer and Information Science,
>>>> >>>>>>> Florida International University
>>>> >>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> --
>>>> >>>>>> *Thanks & Regards*
>>>> >>>>>>
>>>> >>>>>> Unmesha Sreeveni U.B
>>>> >>>>>>
>>>> >>>>>> *Junior Developer*
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> --
>>>> >>>>> ------
>>>> >>>>> Yexi Jiang,
>>>> >>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>>>> School of Computer and Information Science,
>>>> >>>>> Florida International University
>>>> >>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> *Thanks & Regards*
>>>> >>>>
>>>> >>>> Unmesha Sreeveni U.B
>>>> >>>>
>>>> >>>> *Junior Developer*
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> ------
>>>> >>> Yexi Jiang,
>>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>>> >>> School of Computer and Information Science,
>>>> >>> Florida International University
>>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> *Thanks & Regards*
>>>> >>
>>>> >> Unmesha Sreeveni U.B
>>>> >>
>>>> >> *Junior Developer*
>>>> >>
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> > ------
>>>> > Yexi Jiang,
>>>> > ECS 251,  yjian004@cs.fiu.edu
>>>> > School of Computer and Information Science,
>>>> > Florida International University
>>>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> >
>>>>
>>>>
>>>> --
>>>> *Thanks & Regards*
>>>>
>>>> Unmesha Sreeveni U.B
>>>>
>>>> *Junior Developer*
>>>>
>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>
>>>
>>
>>
>> --
>> *Thanks & Regards*
>>
>> Unmesha Sreeveni U.B
>>
>> *Junior Developer*
>>
>>
>>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>


-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Mime
View raw message