hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: har/unhar utility
Date Wed, 03 Sep 2008 12:11:51 GMT
Ok .. You could try this - run the hadoop archive tool in your local hadoop
setup. For e.g. If you want to create a archive of the conf directory, you
could run - "bin/hadoop archive -archiveName tmp.har conf test".
Now copy the contents of the test directory to the dfs.
"bin/hadoop dfs -put test/tmp.har tmp.har". It should be possible to look at
this using the hadoop fs commands (like bin/hadoop dfs -ls
har:///user/ddas/tmp.har) or from a MR job.
The one thing you should note is that the paths in the har fs have the names
of the paths in your local machine...

BTW I myself never tried the above..

The other option is to concatenate (if possible) the files into bigger files
and then upload those to the dfs..

On 9/3/08 4:37 PM, "Dmitry Pushkarev" <umka@stanford.edu> wrote:

> Probably, but the current idea is to bypass writing small files to HDFS by
> creating my own local har archive and uploading it. (small files lower
> transfer speed from 40-70MB/s to hundreds ok kbps :(
> 
> -----Original Message-----
> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
> Sent: Wednesday, September 03, 2008 4:00 AM
> To: core-user@hadoop.apache.org
> Subject: Re: har/unhar utility
> 
> You could create a har archive of the small files and then pass the
> corresponding har filesystem as input to your mapreduce job. Would that
> work?
> 
> 
> On 9/3/08 4:24 PM, "Dmitry Pushkarev" <umka@stanford.edu> wrote:
> 
>> Not quite, I want to be able to create har archives on local system and
> then
>> send them to HDFS, and back since I work with many small files (10kb) and
>> hadoop seem to behave poorly with them.
>> 
>> Perhaps HBASE is another option. Is anyone using it in "production" mode?
>> And do I really need to downgrade to 17.x to install it?
>> 
>> -----Original Message-----
>> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
>> Sent: Wednesday, September 03, 2008 3:35 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: har/unhar utility
>> 
>> Are you looking for user documentation on har? If so, here it is:
>> http://hadoop.apache.org/core/docs/r0.18.0/hadoop_archives.html
>> 
>> 
>> On 9/3/08 3:21 PM, "Dmitry Pushkarev" <umka@stanford.edu> wrote:
>> 
>>> Does anyone have har/unhar utility?
>>> 
>>>  Or at least format description: It looks pretty obvious though, but just
>> in
>>> case.
>>> 
>>>  
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 



Mime
View raw message