That's most likely because the XML isn't valid :-)
Seriously, the "no content allowed in prolog" message
is sometimes due to an incorrect text encoding.
Does this run ok locally?
--Thilo
rohan rai wrote:
> Thanks Thilo. Well If do that all sorts of invalid xml exception is getting
> thrown
>
> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
> <unknown source>.
> at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
> at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
> at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
> at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
> at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
> ... 8 more
> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
> <unknown source>.
> at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
> at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
> at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
> at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
> at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
>
>
>
> On Wed, Jun 11, 2008 at 6:08 PM, Thilo Goetz <twgoetz@gmx.de> wrote:
>
>> You need to use import by name instead of import
>> by location in your descriptor. Then things get
>> loaded via the classpath and you should be ok
>> (provided that you stick your descriptors in the
>> jar of course). I suggest you test this locally
>> first by moving your application to a different
>> machine where you don't have any descriptors
>> lying around. It'll be easier to debug than in
>> hadoop.
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> Well the question is for running UIMA over hadoop? How to do that as in
>>> UIMA
>>> there are xml descriptors which have relative urls and location? Which
>>> throws exception
>>>
>>> But I can probably do without that answer
>>>
>>> Simplifying the problem
>>>
>>> I create a jar for my application and I am trying to run a map reduce job
>>>
>>> In the map I am trying to read an xml resource which gives this kind of
>>> exceprion
>>>
>>> java.io.FileNotFoundException:
>>>
>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
>>> (No such file or directory)
>>> at java.io.FileInputStream.open(Native Method)
>>> at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>> at java.io.FileInputStream.<init>(FileInputStream.java:66)
>>> at
>>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>> at
>>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>> at java.net.URL.openStream(URL.java:1009)
>>> at
>>> org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)
>>>
>>> I think I require to pass on the content of the jar which contains the
>>> resource xml and classes(other than the JOB class) to each and every
>>> taskXXXXXXX getting created
>>>
>>> How can I do that
>>>
>>> REgards
>>> Rohan
>>>
>>>
>>>
>>>
>>> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <
>>> mba@michael-baessler.de>
>>> wrote:
>>>
>>> rohan rai wrote:
>>>>> Hi
>>>>> A simple thing such as a name annotator which has an import location
of
>>>>> type starts throwing exception when I create a jar of the application
I
>>>>>
>>>> am
>>>>
>>>>> developing and run over hadoop.
>>>>>
>>>>> If I have to do it a java class file then I can use XMLInputSource in
=
>>>>>
>>>> new
>>>>
>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>
>>>>> But the relative paths in annotators, analysis engines etc starts
>>>>>
>>>> throwing
>>>>
>>>>> exception
>>>>>
>>>>> Please Help
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>> I'm not sure I understand your question, but I think you need some help
>>>> with the exceptions you get.
>>>> Can you provide the exception stack trace?
>>>>
>>>> -- Michael
>>>>
>>>>
>
|