hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: how to implements the 'diff' cmd in hadoop
Date Tue, 20 Mar 2012 10:09:56 GMT
Hi Lin
        In you mapper make the line no as the key and the line contents as
the value. In your reducer check whether the two values for a key are
matching. ie if you are comparing two files then there would be two values
for a line number. If non matching patterns found increment a counter to
determine the number of non matching patterns and write those patterns to
output file . If the values matches for a key do nothing, no need even
writing to output dir.

Bejoy KS

On Tue, Mar 20, 2012 at 2:01 PM, botma lin <linjfly@gmail.com> wrote:

> Hi, all
>      I'm newbie to hadoop.
>      I'm trying to compare two large file and get the difference between
> them ,like the diff cmd in linux,
>  however,  the mapred api can only get one record at a time . so how can I
> get the relative records in two files and compare them by using mapred api.
>     thinks!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message