hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal Golan <egola...@gmail.com>
Subject Re: Is it possible to user hadoop archive to specify third party libs
Date Sat, 07 Jan 2012 06:55:12 GMT
hi,
can you please point out link to Cloudera's article?

thanks,

Eyal


Eyal Golan
egolan74@gmail.com

Visit: http://jvdrums.sourceforge.net/
LinkedIn: http://www.linkedin.com/in/egolan74
Skype: egolan74

P  Save a tree. Please don't print this e-mail unless it's really necessary



On Tue, Jan 3, 2012 at 5:28 PM, Samir Eljazovic
<samir.eljazovic@gmail.com>wrote:

> Hi,
> yes, I'm trying to get option 1 from Cloudera's article (using distributed
> cache) work. If I specify all libraries when running the job it works, but
> I'm trying to make it work using only one archive file containing all
> native libraries I need. And that seems to be a problem.
>
> when I use tar file libraries are extracted but they are not added to
> classpath.
>
> Here's TT log:
>
> 2012-01-03 15:04:43,611 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Creating openCV.tar in
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652with
rwxr-xr-x
> 2012-01-03 15:04:44,209 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Extracting
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652/openCV.tarto
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar-work--7133799918421346652
> 2012-01-03 15:04:44,363 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager (Thread-447):
> Cached hdfs://
> 10.190.207.247:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar#openCV.taras
> /mnt/var/lib/hadoop/mapred/taskTracker/hadoop/distcache/8087259939901130551_1003999143_605667452/
> 10.190.207.247/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201201031358_0008/archives/openCV.tar
>
> What should I do to get these libs available to my job?
>
> Thanks
>
>
> On 3 January 2012 15:57, Praveen Sripati <praveensripati@gmail.com> wrote:
>
>> Check this article from Cloudera for different options.
>>
>>
>> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>>
>> Praveen
>>
>> On Tue, Jan 3, 2012 at 7:41 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>>> Samir,
>>>
>>> I believe HARs won't work there. But you can use a regular tar instead,
>>> and that should be unpacked properly.
>>>
>>> On 03-Jan-2012, at 5:38 AM, Samir Eljazovic wrote:
>>>
>>> > Hi,
>>> > I need to provide a lot of 3th party libraries (both java and native)
>>> and doing that using generic option parser (-libjars and -files arguments)
>>> is a little bit messy. I was wandering if it is possible to wrap all
>>> libraries into single har archive and use that when submitting the job?
>>> >
>>> > Just to mention that I want to avoid putting all libraries into job
>>> jar for two reasons:
>>> > 1. does not work for  native libs
>>> > 2. takes time to upload jar
>>> >
>>> > Thanks,
>>> > Samir
>>>
>>>
>>
>

Mime
View raw message