jena-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Vesse <rav...@ecs.soton.ac.uk>
Subject Turtle file with UTF-8 BOM fails to parse
Date Fri, 17 Dec 2010 11:42:52 GMT


Hi all 

I had this issue reported to me recently and have been able to
confirm it myself (example data file attached). Essentially the issue is
that if a Turtle file has a BOM at the start then Jena will refuse to parse
it giving the following error: 

Exception in thread "main"
com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at line 1,
column 2. Encountered: "@" (64), after : "ufeff"
 at
com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
 at
com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
 at
com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
 at
com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
 at
com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
 at
TurtleWithBOM.main(TurtleWithBOM.java:31) 

The code I used to produce this
error was as follows: 

import com.hp.hpl.jena.rdf.model.*;
import
com.hp.hpl.jena.util.FileManager;

import java.io.*;

public class
TurtleWithBOM
{

 public static void main(String[] args)
 {

 // create an
empty model
 Model model = ModelFactory.createDefaultModel();

 InputStream
in = FileManager.get().open( "ttl-with-bom.ttl" );
 if (in == null)
 {

throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found");

}

 // read the Turtle file
 model.read(in, "", "TTL");

 // write it to
standard out
 model.write(System.out);
 }
} 

A sample data file used with
the above code to reproduce the error is attached. 

The data files are
coming from my software which is all written in .Net and when outputting in
UTF-8 the default behaviour of .Net is to include the BOM at the start of
the file. The BOM is not required for UTF-8 but it is not forbidden so I
think this should be fixed (if possible) for future releases. I will be
modifying my software so that output of the BOM can be disabled by my users
if desired 

Looking at the error message given I expect that the same
problem would also affect N3 files since they are using the same reader
afaict from the error trace. 

Regards, 

Rob Vesse  
-- 
PhD Student
IAM
Group
Bay 20, Room 4027, Building 32
Electronics & Computer
Science
University of Southampton
 
Mime
View raw message