hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: tar or hadoop archive
Date Mon, 27 Jun 2011 14:10:04 GMT
The advantage of a hadoop archive files is it lets you access the
files stored in it directly. For example, if you archived three files
(a.txt, b.txt, c.txt) in an archive called foo.har. You could cat one
of the three files using the hadoop command line:

hadoop fs -cat har:///user/joey/out/foo.har/a.txt

You can also copy files out of the archive or use files in the archive
as input to map reduce jobs.


On Mon, Jun 27, 2011 at 3:06 AM, Rita <rmorgan466@gmail.com> wrote:
> We use hadoop/hdfs to archive data. I archive a lot of file by creating one
> large tar file and then placing to hdfs. Is it better to use hadoop archive
> for this or is it essentially the same thing?
> --
> --- Get your facts first, then you can distort them as you please.--

Joseph Echeverria
Cloudera, Inc.

View raw message