From uima-user-return-1372-apmail-incubator-uima-user-archive=incubator.apache.org@incubator.apache.org Wed Jun 11 13:07:49 2008 Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 19441 invoked from network); 11 Jun 2008 13:07:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jun 2008 13:07:49 -0000 Received: (qmail 49054 invoked by uid 500); 11 Jun 2008 13:07:51 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 49035 invoked by uid 500); 11 Jun 2008 13:07:51 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 49024 invoked by uid 99); 11 Jun 2008 13:07:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jun 2008 06:07:51 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of twgoetz@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 11 Jun 2008 13:07:01 +0000 Received: (qmail invoked by alias); 11 Jun 2008 13:07:18 -0000 Received: from blueice3n1.de.ibm.com (EHLO [9.152.14.84]) [195.212.29.179] by mail.gmx.net (mp028) with SMTP; 11 Jun 2008 15:07:18 +0200 X-Authenticated: #25330878 X-Provags-ID: V01U2FsdGVkX1+rT0kWHTqr+z6z3ugxO1l6NEFO38UfkLFTY6Cy6F xzNWRUI+ImOpIl Message-ID: <484FCD8B.3010705@gmx.de> Date: Wed, 11 Jun 2008 15:05:15 +0200 From: Thilo Goetz User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: uima-user@incubator.apache.org Subject: Re: import location over Hadoop References: <54c3312d0806110231k2b87c7b6nd5b674d26145b643@mail.gmail.com> <484FBA28.6070303@michael-baessler.de> <54c3312d0806110459l4a5e779dt2b54f46d87790eb7@mail.gmail.com> <484FC759.60506@gmx.de> <54c3312d0806110601o70f27f1an8bf33ca9ef6a3dc4@mail.gmail.com> In-Reply-To: <54c3312d0806110601o70f27f1an8bf33ca9ef6a3dc4@mail.gmail.com> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org That's most likely because the XML isn't valid :-) Seriously, the "no content allowed in prolog" message is sometimes due to an incorrect text encoding. Does this run ok locally? --Thilo rohan rai wrote: > Thanks Thilo. Well If do that all sorts of invalid xml exception is getting > thrown > > org.apache.uima.util.InvalidXMLException: Invalid descriptor at > . > at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193) > at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365) > at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346) > at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45) > at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37) > at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084) > Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog. > at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) > at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) > at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176) > ... 8 more > org.apache.uima.util.InvalidXMLException: Invalid descriptor at > . > at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193) > at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365) > at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346) > at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45) > at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37) > at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084) > Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog. > at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) > at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) > at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176) > > > > On Wed, Jun 11, 2008 at 6:08 PM, Thilo Goetz wrote: > >> You need to use import by name instead of import >> by location in your descriptor. Then things get >> loaded via the classpath and you should be ok >> (provided that you stick your descriptors in the >> jar of course). I suggest you test this locally >> first by moving your application to a different >> machine where you don't have any descriptors >> lying around. It'll be easier to debug than in >> hadoop. >> >> --Thilo >> >> >> rohan rai wrote: >> >>> Well the question is for running UIMA over hadoop? How to do that as in >>> UIMA >>> there are xml descriptors which have relative urls and location? Which >>> throws exception >>> >>> But I can probably do without that answer >>> >>> Simplifying the problem >>> >>> I create a jar for my application and I am trying to run a map reduce job >>> >>> In the map I am trying to read an xml resource which gives this kind of >>> exceprion >>> >>> java.io.FileNotFoundException: >>> >>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml >>> (No such file or directory) >>> at java.io.FileInputStream.open(Native Method) >>> at java.io.FileInputStream.(FileInputStream.java:106) >>> at java.io.FileInputStream.(FileInputStream.java:66) >>> at >>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) >>> at >>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) >>> at java.net.URL.openStream(URL.java:1009) >>> at >>> org.apache.uima.util.XMLInputSource.(XMLInputSource.java:83) >>> >>> I think I require to pass on the content of the jar which contains the >>> resource xml and classes(other than the JOB class) to each and every >>> taskXXXXXXX getting created >>> >>> How can I do that >>> >>> REgards >>> Rohan >>> >>> >>> >>> >>> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler < >>> mba@michael-baessler.de> >>> wrote: >>> >>> rohan rai wrote: >>>>> Hi >>>>> A simple thing such as a name annotator which has an import location of >>>>> type starts throwing exception when I create a jar of the application I >>>>> >>>> am >>>> >>>>> developing and run over hadoop. >>>>> >>>>> If I have to do it a java class file then I can use XMLInputSource in = >>>>> >>>> new >>>> >>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null); >>>> >>>>> But the relative paths in annotators, analysis engines etc starts >>>>> >>>> throwing >>>> >>>>> exception >>>>> >>>>> Please Help >>>>> >>>>> Regards >>>>> Rohan >>>>> >>>>> I'm not sure I understand your question, but I think you need some help >>>> with the exceptions you get. >>>> Can you provide the exception stack trace? >>>> >>>> -- Michael >>>> >>>> >