uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: import location over Hadoop
Date Thu, 12 Jun 2008 14:13:02 GMT
rohan rai wrote:
> Just edited it. Hopefully it is explanatory enough

That's great, thanks Rohan.

> 
> On Thu, Jun 12, 2008 at 2:24 PM, Thilo Goetz <twgoetz@gmx.de> wrote:
> 
>> Hi Rohan,
>>
>> good question.  I added a page under "developer tips" I
>> suggest you use:
>> http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> Hi Thilo
>>>
>>> Sorry for asking such a simple thing ...Under which topic should I add
>>> this
>>> info
>>>
>>> Regards
>>> Rohan
>>>
>>> On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <twgoetz@gmx.de> wrote:
>>>
>>>  Hi Rohan,
>>>> I'm glad you got it to work.  This is useful information.  It would
>>>> be great if you could put it up on the UIMA Wiki:
>>>> http://cwiki.apache.org/UIMA/
>>>>
>>>> --Thilo
>>>>
>>>>
>>>> rohan rai wrote:
>>>>
>>>>  I think I got it.....Thanks for all the help you guys.........To make a
>>>>> simple UIMA app work over hadoop (I did it on pseudo distributed
>>>>> environment) 3-4 factors come together..
>>>>>
>>>>> 1) the UIMA app along with the mapper reducer and your job main file
+
>>>>> the
>>>>> the resources should be contained within the job jar you created
>>>>>
>>>>> 2) probably all import in the descriptor should be import by name
>>>>> (haven't
>>>>> verified this works with location)
>>>>>
>>>>> 3) any resource being read in any of the class file should be done via
>>>>> Classloader
>>>>>  E.g XMLInputSource in = new
>>>>>
>>>>>
>>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>>
>>>>> 4) the When any AnalysisEngine or something like that of UIMA  is being
>>>>> getting produced (I am doing it in mapper) then ResourceManager should
>>>>> be
>>>>> used
>>>>>  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>>>>>               rMng.setExtensionClassPath(str, true); //Here str is the
>>>>> path to any of the resources which can be obtained via
>>>>>
>>>>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>>>>>               rMng.setDataPath(str);
>>>>>               aEngine =
>>>>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
>>>>>
>>>>> This 4th point has to be considered as when we read a xml without using
>>>>> classloader by default it reads from temp task directory eg.
>>>>>
>>>>>
>>>>>
>>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
>>>>>
>>>>> But all the resources and classes gets unjarred in
>>>>>
>>>>>
>>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
>>>>>
>>>>> directory
>>>>>
>>>>> So to tell the system to look out for the resources in the correct
>>>>> directory when not using classloader (which is what UIMA's
>>>>> XMLInputSource does)
>>>>> we have to use resource manager
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>>  ...
>>>>
>>>>
> 

Mime
View raw message