hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <l...@worldlingo.com>
Subject Re: Jar file location
Date Mon, 07 Jan 2008 20:06:50 GMT
Ted,

Means going the HADOOP_CLASSPATH route, ie. creating a separate 
directory for those shared jars and then set it once in the 
hadoop-env.sh, I think this will work for me too, I am in the process of 
setting a separate CONF_DIR anyways after my recent update - where I 
forgot a couple of files to copy them into the new tree.

I was following this: 
http://www.mail-archive.com/hadoop-commits@lucene.apache.org/msg02860.html 

Which I could not find on the Wiki really, although the above is a 
commit. Am I missing something?

Lars


Ted Dunning wrote:
> /lib is definitely the way to go.
>
> But adding gobs and gobs of stuff there makes jobs start slowly because you
> have to propagate a multi-megabyte blob to lots of worker nodes.
>
> I would consider adding universally used jars to the hadoop class path on
> every node, but I would also expect to face configuration management
> nightmares (small ones, though) from doing this.
>
>
> On 1/7/08 11:50 AM, "Lars George" <lars@worldlingo.com> wrote:
>
>   
>> Arun,
>>
>> Ah yes, I see it now in JobClient. OK, then how are the required aux
>> libs handled? I assume a /lib inside the job jar is the only way to go?
>>
>> I saw the discussion on the Wiki about adding Hbase permanently to the
>> HADOOP_CLASSPATH, but then I also have to deploy the Lucene jar files,
>> Xerces etc. I guess it is better if I add everything non-Hadoop into the
>> job jar's lib directory?
>>
>> Thanks again for the help,
>> Lars
>>
>>
>> Arun C Murthy wrote:
>>     
>>> On Mon, Jan 07, 2008 at 08:24:36AM -0800, Lars George wrote:
>>>   
>>>       
>>>> Hi,
>>>>
>>>> Maybe someone here can help me with a rather noob question. Where do I
>>>> have to put my custom jar to run it as a map/reduce job? Anywhere and
>>>> then specifying the HADOOP_CLASSPATH variable in hadoop-env.sh?
>>>>
>>>>     
>>>>         
>>> Once you have your jar and submit it for your job via the *hadoop jar*
>>> command the framework takes care of distributing the software for nodes on
>>> which your maps/reduces are scheduled:
>>> $ hadoop jar <custom_jar> <custom_args>
>>>
>>> The detail is that the framework copies your jar from the submission node to
>>> the HDFS and then copies it onto the execution node.
>>>
>>> Does http://lucene.apache.org/hadoop/docs/r0.15.1/mapred_tutorial.html#Usage
>>> help?
>>>
>>> Arun
>>>
>>>   
>>>       
>>>> Also, since I am using the Hadoop API already from our server code, it
>>>> seems natural to launch jobs from within our code. Are there any issue
>>>> with that? I assume I have to copy the jar files first and make them
>>>> available as per my question above, but then I am ready to start it from
>>>> my own code?
>>>>
>>>> I have read most Wiki entries and while the actual workings are
>>>> described quite nicely, I could not find an answer to the questions
>>>> above. The demos are already in place and can be started as is without
>>>> the need of making them available.
>>>>
>>>> Again, I apologize for being a noobie.
>>>>
>>>> Lars
>>>>     
>>>>         
>>>   
>>>       
>
>   

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message