hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: har/unhar utility
Date Wed, 03 Sep 2008 12:17:14 GMT
I should have mentioned that in the step where you create a har archive
locally, you should use the local job runner.


On 9/3/08 5:41 PM, "Devaraj Das" <ddas@yahoo-inc.com> wrote:

> Ok .. You could try this - run the hadoop archive tool in your local hadoop
> setup. For e.g. If you want to create a archive of the conf directory, you
> could run - "bin/hadoop archive -archiveName tmp.har conf test".
> Now copy the contents of the test directory to the dfs.
> "bin/hadoop dfs -put test/tmp.har tmp.har". It should be possible to look at
> this using the hadoop fs commands (like bin/hadoop dfs -ls
> har:///user/ddas/tmp.har) or from a MR job.
> The one thing you should note is that the paths in the har fs have the names
> of the paths in your local machine...
> 
> BTW I myself never tried the above..
> 
> The other option is to concatenate (if possible) the files into bigger files
> and then upload those to the dfs..
> 
> On 9/3/08 4:37 PM, "Dmitry Pushkarev" <umka@stanford.edu> wrote:
> 
>> Probably, but the current idea is to bypass writing small files to HDFS by
>> creating my own local har archive and uploading it. (small files lower
>> transfer speed from 40-70MB/s to hundreds ok kbps :(
>> 
>> -----Original Message-----
>> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
>> Sent: Wednesday, September 03, 2008 4:00 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: har/unhar utility
>> 
>> You could create a har archive of the small files and then pass the
>> corresponding har filesystem as input to your mapreduce job. Would that
>> work?
>> 
>> 
>> On 9/3/08 4:24 PM, "Dmitry Pushkarev" <umka@stanford.edu> wrote:
>> 
>>> Not quite, I want to be able to create har archives on local system and
>> then
>>> send them to HDFS, and back since I work with many small files (10kb) and
>>> hadoop seem to behave poorly with them.
>>> 
>>> Perhaps HBASE is another option. Is anyone using it in "production" mode?
>>> And do I really need to downgrade to 17.x to install it?
>>> 
>>> -----Original Message-----
>>> From: Devaraj Das [mailto:ddas@yahoo-inc.com]
>>> Sent: Wednesday, September 03, 2008 3:35 AM
>>> To: core-user@hadoop.apache.org
>>> Subject: Re: har/unhar utility
>>> 
>>> Are you looking for user documentation on har? If so, here it is:
>>> http://hadoop.apache.org/core/docs/r0.18.0/hadoop_archives.html
>>> 
>>> 
>>> On 9/3/08 3:21 PM, "Dmitry Pushkarev" <umka@stanford.edu> wrote:
>>> 
>>>> Does anyone have har/unhar utility?
>>>> 
>>>>  Or at least format description: It looks pretty obvious though, but just
>>> in
>>>> case.
>>>> 
>>>>  
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 



Mime
View raw message