Return-Path: Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: (qmail 43886 invoked from network); 18 Dec 2010 18:35:54 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Dec 2010 18:35:54 -0000 Received: (qmail 76271 invoked by uid 500); 18 Dec 2010 18:35:54 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 76246 invoked by uid 500); 18 Dec 2010 18:35:54 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Delivered-To: moderator for jena-dev@incubator.apache.org Received: (qmail 51995 invoked by uid 99); 18 Dec 2010 17:50:22 -0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Message-ID: <6693785.195591292694601313.JavaMail.jira@thor> Date: Sat, 18 Dec 2010 12:50:01 -0500 (EST) From: "Andy Seaborne (JIRA)" To: jena-dev@incubator.apache.org Subject: [jira] Closed: (JENA-12) Turtle Files with a UTF-8 BOM fail to parse In-Reply-To: <11927717.193311292677560665.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/JENA-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne closed JENA-12. ----------------------------- Resolution: Fixed Fixed in both places; RIOT and Jena (old reader) > Turtle Files with a UTF-8 BOM fail to parse > ------------------------------------------- > > Key: JENA-12 > URL: https://issues.apache.org/jira/browse/JENA-12 > Project: Jena > Issue Type: Bug > Components: RIOT > Environment: Windows 7, latest Sun Java Runtime, Jena 2.6.4 > Reporter: Rob Vesse > Assignee: Andy Seaborne > Attachments: ttl-with-bom.ttl > > > If a Turtle file has a BOM at the start then Jena will refuse to parse it giving the following error: > Exception in thread "main" com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at line 1, column 2. Encountered: "@" (64), after : "\ufeff" > at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44) > at com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21) > at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101) > at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68) > at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226) > at TurtleWithBOM.main(TurtleWithBOM.java:31) > The code I used to produce this error was as follows: > import com.hp.hpl.jena.rdf.model.*; > import com.hp.hpl.jena.util.FileManager; > import java.io.*; > public class TurtleWithBOM > { > public static void main(String[] args) > { > // create an empty model > Model model = ModelFactory.createDefaultModel(); > InputStream in = FileManager.get().open( "ttl-with-bom.ttl" ); > if (in == null) > { > throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found"); > } > // read the Turtle file > model.read(in, "", "TTL"); > // write it to standard out > model.write(System.out); > } > } > A sample Turtle file used with the above code is attached to this issue. > The data files are coming from my software which is all written in .Net and when outputting in UTF-8 the default behaviour of .Net is to include the BOM at the start of the file. The BOM is not required for UTF-8 but it is not forbidden so I think this should be fixed (if possible) for future releases. I will be modifying my software so that output of the BOM can be disabled by my users if desired > Looking at the error message given I expect that the same problem would also affect N3 files since they are using the same reader afaict from the error trace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.