uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: import location over Hadoop
Date Wed, 11 Jun 2008 13:05:15 GMT
That's most likely because the XML isn't valid :-)
Seriously, the "no content allowed in prolog" message
is sometimes due to an incorrect text encoding.

Does this run ok locally?

--Thilo

rohan rai wrote:
> Thanks Thilo. Well If do that all sorts of invalid xml exception is getting
> thrown
> 
> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
> <unknown source>.
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> 	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
> 	... 8 more
> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
> <unknown source>.
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> 	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
> 
> 
> 
> On Wed, Jun 11, 2008 at 6:08 PM, Thilo Goetz <twgoetz@gmx.de> wrote:
> 
>> You need to use import by name instead of import
>> by location in your descriptor.  Then things get
>> loaded via the classpath and you should be ok
>> (provided that you stick your descriptors in the
>> jar of course).  I suggest you test this locally
>> first by moving your application to a different
>> machine where you don't have any descriptors
>> lying around.  It'll be easier to debug than in
>> hadoop.
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> Well the question is for running UIMA over hadoop? How to do that as in
>>> UIMA
>>> there are xml descriptors which have relative urls and location? Which
>>> throws exception
>>>
>>> But I can probably do without that answer
>>>
>>> Simplifying the problem
>>>
>>> I create a jar for my application and I am trying to run a map reduce job
>>>
>>> In the map I am trying to read an xml resource which gives this kind of
>>> exceprion
>>>
>>> java.io.FileNotFoundException:
>>>
>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
>>> (No such file or directory)
>>>        at java.io.FileInputStream.open(Native Method)
>>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>>        at java.io.FileInputStream.<init>(FileInputStream.java:66)
>>>        at
>>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>>        at
>>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>>        at java.net.URL.openStream(URL.java:1009)
>>>        at
>>> org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)
>>>
>>> I think I require to pass on the content of the jar which contains the
>>> resource xml and classes(other than the JOB class) to each and every
>>> taskXXXXXXX getting created
>>>
>>> How can I do that
>>>
>>> REgards
>>> Rohan
>>>
>>>
>>>
>>>
>>> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <
>>> mba@michael-baessler.de>
>>> wrote:
>>>
>>>  rohan rai wrote:
>>>>> Hi
>>>>>  A simple thing such as a name annotator which has an import location
of
>>>>> type starts throwing exception when I create a jar of the application
I
>>>>>
>>>> am
>>>>
>>>>> developing and run over hadoop.
>>>>>
>>>>> If I have to do it a java class file then I can use XMLInputSource in
=
>>>>>
>>>> new
>>>>
>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>
>>>>> But the relative paths in annotators, analysis engines etc starts
>>>>>
>>>> throwing
>>>>
>>>>> exception
>>>>>
>>>>> Please Help
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>>  I'm not sure I understand your question, but I think you need some help
>>>> with the exceptions you get.
>>>> Can you provide the exception stack trace?
>>>>
>>>> -- Michael
>>>>
>>>>
> 

Mime
View raw message