hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "some speed" <speed.s...@gmail.com>
Subject Re: reading input for a map function from 2 different files?
Date Sun, 16 Nov 2008 06:49:54 GMT
Thank you all!!
What Milind has said will do the trick for me as a I need accurate values
for the deviation.
Passing variables between jobs by means of the Configure method and
getInt/setInt will make things a lot easier!

On Wed, Nov 12, 2008 at 7:07 PM, Milind Bhandarkar <milindb@yahoo-inc.com>wrote:

> Since you need to pass only one number (average) to all mappers, you can
> pass it through jobconf with a config variable defined by you, say
> "my.average"..
>
> - milind
>
>
> On 11/11/08 8:25 PM, "some speed" <speed.some@gmail.com> wrote:
>
> > Thanks for the response. What I am trying is to do is finding the average
> > and then the standard deviation for a very large set (say a million) of
> > numbers. The result would be used in further calculations.
> > I have got the average from the first map-reduce chain. now i need to
> read
> > this average as well as the set of numbers to calculate the standard
> > deviation.  so one file would have the input set and the other
> "resultant"
> > file would have just the average.
> > Please do tell me in case there is a better way of doing things than what
> i
> > am doing. Any input/suggestion is appreciated.:)
> >
> >
> >
> > On Mon, Nov 10, 2008 at 4:22 AM, Amar Kamat <amarrk@yahoo-inc.com>
> wrote:
> >
> >> Amar Kamat wrote:
> >>
> >>> some speed wrote:
> >>>
> >>>> I was wondering if it was possible to read the input for a map
> function
> >>>> from
> >>>> 2 different files:
> >>>>  1st file ---> user-input file from a particular location(path)
> >>>>
> >>> Is the input/user file sorted? If yes then you can use "map-side join"
> for
> >> performance reasons. See org.apache.hadoop.mapred.join for more details.
> >>
> >>> 2nd file=---> A resultant file (has just one <key,value> pair)
from a
> >>>> previous MapReduce job. (I am implementing a chain MapReduce function)
> >>>>
> >>> Can you explain in more detail the contents of 2nd file?
> >>
> >>>
> >>>> Now, for every <key,value> pair in the user-input file, I would
like
> to
> >>>> use
> >>>> the same <key,value> pair from the 2nd file for some calculations.
> >>>>
> >>> Can you explain this in more detail? Can you give some abstracted
> example
> >> of how file1 and file2 look like and what operation/processing you want
> to
> >> do?
> >>
> >>
> >>>>
> >>> I guess you might need to do some kind of join on the 2 files. Look at
> >>> contrib/data_join for more details.
> >>> Amar
> >>>
> >>>> Is it possible for me to do so? Can someone guide me in the right
> >>>> direction
> >>>> please?
> >>>>
> >>>>
> >>>> Thanks!
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
>
>
> --
> Milind Bhandarkar
> Y!IM: GridSolutions
> 408-349-2136
> (milindb@yahoo-inc.com)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message