hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: From X to Hadoop MapReduce
Date Thu, 02 Sep 2010 02:10:06 GMT
'hamake' on github looks like a handy tool as well- haven't used it.
It does the old unix 'make' timestamp dependency trick on the
input&output file sets, to decide which jobs to run in sequence. And
possibly in parallel.

Lance

On Wed, Sep 1, 2010 at 12:27 PM, James Seigel <james@tynt.com> wrote:
> Sounds good!  Please give some examples :)
>
> I just got back from some holidays and will start posting some more stuff shortly
>
> Cheers
> James.
>
>
> On 2010-07-21, at 7:22 PM, Jeff Zhang wrote:
>
>> Cool, James. I am very interested to contribute to this.
>> I think group by, join and order by can been added to the examples.
>>
>>
>> On Thu, Jul 22, 2010 at 4:59 AM, James Seigel <james@tynt.com> wrote:
>>
>>> Oh yeah, it would help if I put the url:
>>>
>>> http://github.com/seigel/MRPatterns
>>>
>>> James
>>>
>>> On 2010-07-21, at 2:55 PM, James Seigel wrote:
>>>
>>>> Here is a skeleton project I stuffed up on github (feel free to offer
>>> other suggestions/alternatives).  There is a wiki, a place to commit code, a
>>> place to fork around, etc..
>>>>
>>>> Over the next couple of days I’ll try and put up some sample samples for
>>> people to poke around with.  Feel free to attack the wiki, contribute code,
>>> etc...
>>>>
>>>> If anyone can derive some cool pseudo code to write map reduce type
>>> algorithms that’d be great.
>>>>
>>>> Cheers
>>>> James.
>>>>
>>>>
>>>> On 2010-07-21, at 10:51 AM, James Seigel wrote:
>>>>
>>>>> Jeff, I agree that cascading looks cool and might/should have a place
in
>>> everyone’s tool box, however at some corps it takes a while to get those
>>> kinds of changes in place and therefore they might have to hand craft some
>>> java code before moving (if they ever can) to a different technology.
>>>>>
>>>>> I will get something up and going and post a link back for whomever is
>>> interested.
>>>>>
>>>>> To answer Himanshu’s question, I am thinking something like this (with
>>> some code):
>>>>>
>>>>> Hadoop M/R Patterns, and ones that match Pig Structures
>>>>>
>>>>> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same
>>> as reducer. [Reducer] count = count + next.value.  [Emit] Single result.
>>>>> 2. FREQ COUNT: [Mapper] Item, 1.  [Combiner] Same as reducer. [Reducer]
>>> count = count + next.value.  [Emit] list of Key, count
>>>>> 3. UNIQUE: [Mapper] Item, One.  [Combiner] None.  [Reducer + Emit]
spit
>>> out list of keys and no value.
>>>>>
>>>>> I think adding a description of why the technique works would be helpful
>>> for people learning as well.  I see some questions from people not
>>> understanding what happens to the data between mappers and reducers, or what
>>> data they will see when it gets to the reducer...etc...
>>>>>
>>>>> Cheers
>>>>> James.
>>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message