Hi,
You can refer to the following code to calculate sigmax(sum)
Mappers
Extracting a specific column 
https://github.com/zinniaphatakdev/Nectar/blob/master/Nectarcommon/src/main/java/com/zinnia/nectar/util/hadoop/FieldSeperator.java
Sum Mapper 
https://github.com/zinniaphatakdev/Nectar/blob/master/Nectarregression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce/SigmaMapper.java
Sum Reducer 
https://github.com/zinniaphatakdev/Nectar/blob/master/Nectarregression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/mapreduce/DoubleSumReducer.java
Driver or Main class 
https://github.com/zinniaphatakdev/Nectar/blob/master/Nectarregression/src/main/java/com/zinnia/nectar/regression/hadoop/primitive/jobs/SigmaJob.java
By default it works for a tab seperarted file . But you can easily change
the code by change FieldSeperator code.
On Tue, Apr 3, 2012 at 10:25 AM, Fang Xin <nusfangxin@gmail.com> wrote:
> Hi Rohit, thank you for your reply.
> As for the second assumption, could you kindly further enlighten me a
> bit, please?
>
> Thank you.
>
> On Tue, Apr 3, 2012 at 12:50 PM, Rohit Kelkar <rohitkelkar@gmail.com>
> wrote:
> > Your idea in first paragraph is correct. To speed up things you can
> > also explore the possibility of using a Combiner. For ex. for
> > computing the sum set the combiner to be the same class as your
> > reducer. For calculating variance write a combiner class that would
> > output (xi  mu)^2 and in the reducer code you could take the sqrt.
> >
> > Your second assumption that number of reducers = number of variables
> > is not right.
> >
> >  Rohit Kelkar
> >
> > On Tue, Apr 3, 2012 at 10:10 AM, Fang Xin <nusfangxin@gmail.com> wrote:
> >> Hi,
> >>
> >> I have a spreadsheet where each column contains values for one
> >> variable. and I need to calculate sum, variance, etc for each column.
> >> For my understanding, mapper and reducer work for <key, value> pair,
> >> can anyone kindly enlighten me how to abstract this problem?
> >>
> >> Maybe for the mapper, let it read each line, set variable name/number
> >> as "key", and corresponding value as "value".
> >> Then when all pairs with the same "key" (i.e. they belong to same
> >> variable) be passed to a reducer, reducer can do the calculation, and
> >> output to file.
> >> is this idea correct? can anyone kindly give some comment?
> >>
> >> Besides, in this method, the number of reducers will be determined by
> >> the number of variables I have.
> >> What happen if variable number is limited, and for each variable, the
> >> number of entries is far much bigger than the total number of
> >> variables, then execution time for each reducer can be comparatively
> >> long.
> >> Any way to make use of more hardware resource, and create more
> >> reducers to run in parallel?
> >>
> >> Best regards,
> >> Xin
>

https://github.com/zinniaphatakdev/Nectar
