uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: import location over Hadoop
Date Thu, 12 Jun 2008 08:54:32 GMT
Hi Rohan,

good question.  I added a page under "developer tips" I
suggest you use:
http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop

--Thilo

rohan rai wrote:
> Hi Thilo
> 
> Sorry for asking such a simple thing ...Under which topic should I add this
> info
> 
> Regards
> Rohan
> 
> On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <twgoetz@gmx.de> wrote:
> 
>> Hi Rohan,
>>
>> I'm glad you got it to work.  This is useful information.  It would
>> be great if you could put it up on the UIMA Wiki:
>> http://cwiki.apache.org/UIMA/
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> I think I got it.....Thanks for all the help you guys.........To make a
>>> simple UIMA app work over hadoop (I did it on pseudo distributed
>>> environment) 3-4 factors come together..
>>>
>>> 1) the UIMA app along with the mapper reducer and your job main file + the
>>> the resources should be contained within the job jar you created
>>>
>>> 2) probably all import in the descriptor should be import by name (haven't
>>> verified this works with location)
>>>
>>> 3) any resource being read in any of the class file should be done via
>>> Classloader
>>>   E.g XMLInputSource in = new
>>>
>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>
>>> 4) the When any AnalysisEngine or something like that of UIMA  is being
>>> getting produced (I am doing it in mapper) then ResourceManager should be
>>> used
>>>  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>>>                rMng.setExtensionClassPath(str, true); //Here str is the
>>> path to any of the resources which can be obtained via
>>>
>>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>>>                rMng.setDataPath(str);
>>>                aEngine =
>>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
>>>
>>> This 4th point has to be considered as when we read a xml without using
>>> classloader by default it reads from temp task directory eg.
>>>
>>>
>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
>>>
>>> But all the resources and classes gets unjarred in
>>>
>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
>>>
>>> directory
>>>
>>> So to tell the system to look out for the resources in the correct
>>> directory when not using classloader (which is what UIMA's
>>> XMLInputSource does)
>>> we have to use resource manager
>>>
>>> Regards
>>> Rohan
>>>
>> ...
>>
>>
> 

Mime
View raw message