hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Jar file location
Date Mon, 07 Jan 2008 21:27:56 GMT

These sound right to me, but I have only personally used (4).  Also, in (4),
you have to make sure that the jars are under /lib in the big fat jar.

I can't comment on (3).  Perhaps there is a committer handy?  Olga?  Alan?
Doug?


On 1/7/08 1:22 PM, "Lars George" <lars@worldlingo.com> wrote:

> Ted,
> 
> So we have these choices?
> 
> 1. Local copy of libs and setting HADOOP_CLASSPATH
> 
> 2. Using DistributedCache and upload the files "manually" into it.
> 
> 3. Add jars using the Job interface (JIRA 1622)
> 
> 4. Pack everything into one big fat job jar
> 
> Am I missing something?
> 
> Question, is the JIRA 1622 actually usable yet? I am using a about 14
> day old nightly developers build, so that should have that in that case?
> 
> Which way would you go?
> 
> Lars
> 
> 
> Ted Dunning wrote:
>> Arun's comment about the DistributedCache is actually a very viable
>> alternative (certainly one that I am about to investigate).
>> 
>> 
>> On 1/7/08 12:06 PM, "Lars George" <lars@worldlingo.com> wrote:
>> 
>>   
>>> Ted,
>>> 
>>> Means going the HADOOP_CLASSPATH route, ie. creating a separate
>>> directory for those shared jars and then set it once in the
>>> hadoop-env.sh, I think this will work for me too, I am in the process of
>>> setting a separate CONF_DIR anyways after my recent update - where I
>>> forgot a couple of files to copy them into the new tree.
>>> 
>>> I was following this:
>>> http://www.mail-archive.com/hadoop-commits@lucene.apache.org/msg02860.html
>>> 
>>> Which I could not find on the Wiki really, although the above is a
>>> commit. Am I missing something?
>>> 
>>> Lars
>>> 
>>> 
>>> Ted Dunning wrote:
>>>     
>>>> /lib is definitely the way to go.
>>>> 
>>>> But adding gobs and gobs of stuff there makes jobs start slowly because you
>>>> have to propagate a multi-megabyte blob to lots of worker nodes.
>>>> 
>>>> I would consider adding universally used jars to the hadoop class path on
>>>> every node, but I would also expect to face configuration management
>>>> nightmares (small ones, though) from doing this.
>>>> 
>>>> 
>>>> On 1/7/08 11:50 AM, "Lars George" <lars@worldlingo.com> wrote:
>>>> 
>>>>   
>>>>       
>>>>> Arun,
>>>>> 
>>>>> Ah yes, I see it now in JobClient. OK, then how are the required aux
>>>>> libs handled? I assume a /lib inside the job jar is the only way to go?
>>>>> 
>>>>> I saw the discussion on the Wiki about adding Hbase permanently to the
>>>>> HADOOP_CLASSPATH, but then I also have to deploy the Lucene jar files,
>>>>> Xerces etc. I guess it is better if I add everything non-Hadoop into
the
>>>>> job jar's lib directory?
>>>>> 
>>>>> Thanks again for the help,
>>>>> Lars
>>>>> 
>>>>> 
>>>>> Arun C Murthy wrote:
>>>>>     
>>>>>         
>>>>>> On Mon, Jan 07, 2008 at 08:24:36AM -0800, Lars George wrote:
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Maybe someone here can help me with a rather noob question. Where
do I
>>>>>>> have to put my custom jar to run it as a map/reduce job? Anywhere
and
>>>>>>> then specifying the HADOOP_CLASSPATH variable in hadoop-env.sh?
>>>>>>> 
>>>>>>>     
>>>>>>>         
>>>>>>>            
>>>>>> Once you have your jar and submit it for your job via the *hadoop
jar*
>>>>>> command the framework takes care of distributing the software for
nodes
>>>>>> on
>>>>>> which your maps/reduces are scheduled:
>>>>>> $ hadoop jar <custom_jar> <custom_args>
>>>>>> 
>>>>>> The detail is that the framework copies your jar from the submission
node
>>>>>> to
>>>>>> the HDFS and then copies it onto the execution node.
>>>>>> 
>>>>>> Does 
>>>>>> http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Usage
>>>>>> help?
>>>>>> 
>>>>>> Arun
>>>>>> 
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>> Also, since I am using the Hadoop API already from our server
code, it
>>>>>>> seems natural to launch jobs from within our code. Are there
any issue
>>>>>>> with that? I assume I have to copy the jar files first and make
them
>>>>>>> available as per my question above, but then I am ready to start
it from
>>>>>>> my own code?
>>>>>>> 
>>>>>>> I have read most Wiki entries and while the actual workings are
>>>>>>> described quite nicely, I could not find an answer to the questions
>>>>>>> above. The demos are already in place and can be started as is
without
>>>>>>> the need of making them available.
>>>>>>> 
>>>>>>> Again, I apologize for being a noobie.
>>>>>>> 
>>>>>>> Lars
>>>>>>>     
>>>>>>>         
>>>>>>>            
>>>>>>   
>>>>>>       
>>>>>>           
>>>>   
>>>>       
>> 
>>   


Mime
View raw message