Return-Path: X-Original-To: apmail-incubator-any23-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-any23-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D22BB9843 for ; Fri, 17 Feb 2012 12:18:23 +0000 (UTC) Received: (qmail 74450 invoked by uid 500); 17 Feb 2012 12:18:23 -0000 Delivered-To: apmail-incubator-any23-dev-archive@incubator.apache.org Received: (qmail 74413 invoked by uid 500); 17 Feb 2012 12:18:23 -0000 Mailing-List: contact any23-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: any23-dev@incubator.apache.org Delivered-To: mailing list any23-dev@incubator.apache.org Received: (qmail 74404 invoked by uid 99); 17 Feb 2012 12:18:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Feb 2012 12:18:23 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Feb 2012 12:18:21 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D5E391BCD29 for ; Fri, 17 Feb 2012 12:17:59 +0000 (UTC) Date: Fri, 17 Feb 2012 12:17:59 +0000 (UTC) From: =?utf-8?Q?Hannes_M=C3=BChleisen_=28Updated=29_=28JIRA=29?= To: any23-dev@incubator.apache.org Message-ID: <499321004.50403.1329481079877.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1541357290.50396.1329480959554.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (ANY23-49) N3/NQ parsers ignoring stopAtFirstError flag MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/ANY23-49?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:all-tabpanel ] Hannes M=C3=BChleisen updated ANY23-49: ---------------------------------- Comment: was deleted (was: Adaptation of NQuadsParser) =20 > N3/NQ parsers ignoring stopAtFirstError flag > -------------------------------------------- > > Key: ANY23-49 > URL: https://issues.apache.org/jira/browse/ANY23-49 > Project: Apache Any23 > Issue Type: Bug > Environment: Any23 0.6.1 and repository > Reporter: Hannes M=C3=BChleisen > Attachments: RobustNquadsParser.java > > > The base interface for all RDF parsers (org.openrdf.rio.RDFParser) define= s a method setStopAtFirstError. The documentation for this methods reads as= "Sets whether the parser should stop immediately if it finds an error in t= he data". This is indeed very useful, as many data sets "out there" contain= an amount of malformed entries. > However, as far as I can tell from the current source code (0.6.1 and SVN= trunk), the NQuadsParser (org.deri.any23.parser.NQuadsParser) ignores this= flag. In its original implementation, it runs through the entire input in = an unchecked loop as such: > while(parseLine(fileReader)) { > nextRow(); > } > Now, if the parsing of any line in a potential huge file throws an except= ion, the entire parsing process stops regardless of the setting of the "sto= pAtFirstError" flag. I propose these loops to be changed to honor this flag= , so that when it is set to "false", the rest of the line is discarded and = the parsing process can continue with the next line. > I have implemented this behavior on the latest version of NQuadsParser fr= om SVN (r1601), the source file is attached. I have changed the parseLine()= method as follows: > private boolean parseLine(BufferedReader br) throws IOException, > =09=09=09RDFParseException, RDFHandlerException { > // [...] > try { > // [...] > // notifiyStatement moved into try block > notifyStatement(sub, pred, obj, graph); > } catch (EOS eos) { > reportFatalError("Unexpected end of line.", row, col); > throw new IllegalStateException(); > } catch (IllegalArgumentException iae) { > if (!stopAtFirstError()) { > // remove remainder of broken line > consumeBrokenLine(br); > // notify parse error listener > reportError(iae.getMessage(), row, col); > } else { > throw new RDFParseException(iae); > } > } > // [...] > } > private void consumeBrokenLine(BufferedReader br) throws IOException { > char c; > while (true) { > mark(br); > c =3D readChar(br); > if (c =3D=3D '\n') { > return; > } > } > } > It would be great if this or similar changes would find their way into th= e various Any23 RDF parsers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira