hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lohit Vijayarenu <lohit.vijayar...@yahoo.com>
Subject Re: Compare data on HDFS side
Date Thu, 04 Sep 2008 15:04:42 GMT

One way is to write a small program which does diff at block level. Open both files, read
data with same offset do a diff. This will tell you diffs at your offset boundry and usefull
to check if two files differ. There is also an open jira which can get you chechsum of files
which would make this task trivial.

On Sep 4, 2008, at 6:51 AM, "Andrey Pankov" <apankov@iponweb.net> wrote:


Does anyone know is it possible to compare data on HDFS but avoid
coping data to local box? I mean if I'd like to find difference
between local text files I can use diff command. If files are at HDFS
then I have to get them from HDFS to local box and only then do diff.
Coping files to local fs is a bit annoying and could be problematical
when files are huge, say 2-5 Gb.

Thanks in advance.

Andrey Pankov

View raw message