hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Albert Chern" <albert.ch...@gmail.com>
Subject Re: MapReduce
Date Fri, 02 Mar 2007 22:13:05 GMT
Technically map reduce requires key/value pairs.  Hadoop's implementation
also requires them.  So if you want to run a map reduce job, you will need
to fit your data to key/value pairs.  Of course, as I have shown, you can
just use a meaningless key or value, but they are still required.

On 3/2/07, jaylac <Jayalakshmi.Muniasamy@cognizant.com> wrote:
>
>
> Is that necessary to find the key/value pairs for fitting a problem to
> mapreduce..... If we dont use key/value pairs, shouldn't we call it as
> MapReduce?
>
> Coz my project manager has proposed an idea to fit our problem into
> mapreduce... in that there is no key/value pairs... but he is telling that
> we can have MapReduce without key/value pairs....
>
>
>
> Albert Chern wrote:
> >
> > Sometimes you need to do a little work to fit a problem into map reduce.
> > You are correct; in this problem, there really are no key/value pairs,
> so
> > you would use a dummy value.  For example, we could just use 0 as a key,
> > so
> > our test scores are:
> >
> > (0, 95)
> > (0,100)
> > (0, 70)
> > and so on...
> >
> > Each map gets one of these and subtracts one from the score, giving us:
> >
> > (0, 94)
> > (0, 99)
> > (0, 69)
> > and so on...
> >
> > There will be a reduce for each key, but we only have one key, so there
> > will
> > be one reduce that gets:
> >
> > (0, [94,99,69,...])
> >
> > The Wikipedia example isn't very good, but we can make it better by
> > dividing
> > the scores into scores for different subjects where we want to find the
> > average for each subject.  We might have:
> >
> > (Biology, 100)
> > (Biology, 95)
> > (Biology, 90)
> > and so on...
> >
> > (Chemistry, 90)
> > (Chemistry, 85)
> > (Chemistry, 80)
> > and so on...
> >
> > After you subtract one from each of these key/value pairs, there will be
> a
> > reduce for each key, which are the different subjects.  So you will have
> > one
> > reduce for each subject:
> >
> > (Biology, [99,94,89,...])
> > (Chemistry, [89,84,79,...])
> > and so on...
> >
> > One more thing: the Wikipedia example says that each reduce outputs one
> > value.  This isn't a requirement for Hadoop map reduce.
> >
> > On 3/1/07, jaylac <Jayalakshmi.Muniasamy@cognizant.com> wrote:
> >>
> >>
> >> Hi
> >>
> >> I was just going thro abt MapReduce for my final year project work.....
> >>
> >> I got confused in the middle.... What i thought is "MapReduce deals
> >> greatly
> >> with key/value pairs only... For fitting a problem into mapreduce we
> >> should
> >> find the key/value pairs"
> >>
> >> I want to know whether im right or wrong....
> >>
> >> I got confused after looking at the explanation in wikipedia... The
> >> following is the content in wikipedia abt mapreduce...
> >>
> >>
> >>
> ========================================================================================
> >> "A map function iterates over a list of independent elements and
> performs
> >> a
> >> specified operation on each element. The list of answers is stored
> >> independently from the original list. Because each element is operated
> on
> >> independently and the original list is not being modified, it is very
> >> easy
> >> to perform a map operation in parallel. On appropriate hardware this
> >> allows
> >> extremely large data sets to be processed in short amounts of elapsed
> >> time.
> >>
> >> For example consider a list of test scores where each score has been
> >> found
> >> to be 1 too high. A map function of s − 1 could be applied to correct
> >> every
> >> score s.
> >>
> >> A reduce operation takes a list and combines elements according to some
> >> algorithm. Since a reduce always ends up with a single answer, it is
> not
> >> as
> >> parallelizable as a map function, but the large number of relatively
> >> independent calculations means that reduce functions are still useful
> in
> >> highly parallel environments.
> >>
> >> Continuing the previous example, what if one wanted to know the average
> >> of
> >> the test scores? One could define a reduce function which halved the
> size
> >> of
> >> the list by adding an entry in the list to its neighbor, recursively
> >> continuing until there is only one (large) entry, and dividing the
> total
> >> sum
> >> by the original number of elements to get the average."
> >>
> >>
> >>
> =========================================================================================
> >>
> >> Here in map function we are simply adding up the test scores.... we are
> >> not
> >> using any key/value pair..... Im totally confused....
> >>
> >> I might be wrong at any point... please someone help me out..... Am i
> >> wrong
> >> in the basic understanding of MapReduce itself..... Ill be thankful if
> >> anyone explains me clearly...
> >>
> >> please help me out to successfully complete my final year project....
> >>
> >> Jaya
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/MapReduce-tf3331603.html#a9263847
> >> Sent from the Hadoop Users mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/MapReduce-tf3331603.html#a9273832
> Sent from the Hadoop Users mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message