jena-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Vesse <rav...@ecs.soton.ac.uk>
Subject Re: Turtle file with UTF-8 BOM fails to parse
Date Sat, 18 Dec 2010 13:09:58 GMT
Hi Andy

I've created a JIRA issue for this -
https://issues.apache.org/jira/browse/JENA-12

I appreciate the need for minimal, complete examples as I have enough
trouble getting those out of users on my own support lists

Thanks,

Rob

On Fri, 17 Dec 2010 14:10:09 +0000, Andy Seaborne
<andy.seaborne@epimorphics.com> wrote:
> Hi Rob,
> 
> Thanks for the minimal, complete, example.
> 
> The parsers should cope with a UTF-8 BOM even if it's not recommended.
> 
> Could you raise a JIRA issue for this please (it's the new process!). 
> It'll need fixing in Jena and RIOT.
> 
> 	Andy
> 
> On 17/12/10 11:42, Rob Vesse wrote:
>> Hi all
>>
>> I had this issue reported to me recently and have been able to confirm
>> it myself (example data file attached). Essentially the issue is that if
>> a Turtle file has a BOM at the start then Jena will refuse to parse it
>> giving the following error:
>>
>> Exception in thread "main"
>> com.hp.hpl.jena.n3.turtle.TurtleParseException: Lexical error at line 1,
>> column 2. Encountered: "@" (64), after : "\ufeff"
>> at com.hp.hpl.jena.n3.turtle.ParserTurtle.parse(ParserTurtle.java:44)
>> at
>> com.hp.hpl.jena.n3.turtle.TurtleReader.readWorker(TurtleReader.java:21)
>> at com.hp.hpl.jena.n3.JenaReaderBase.readImpl(JenaReaderBase.java:101)
>> at com.hp.hpl.jena.n3.JenaReaderBase.read(JenaReaderBase.java:68)
>> at com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:226)
>> at TurtleWithBOM.main(TurtleWithBOM.java:31)
>>
>> The code I used to produce this error was as follows:
>>
>> import com.hp.hpl.jena.rdf.model.*;
>> import com.hp.hpl.jena.util.FileManager;
>>
>> import java.io.*;
>>
>> public class TurtleWithBOM
>> {
>>
>> public static void main(String[] args)
>> {
>>
>> // create an empty model
>> Model model = ModelFactory.createDefaultModel();
>>
>> InputStream in = FileManager.get().open( "ttl-with-bom.ttl" );
>> if (in == null)
>> {
>> throw new IllegalArgumentException( "File: ttl-with-bom.ttl not found");
>> }
>>
>> // read the Turtle file
>> model.read(in, "", "TTL");
>>
>> // write it to standard out
>> model.write(System.out);
>> }
>> }
>>
>> A sample data file used with the above code to reproduce the error is
>> attached.
>>
>> The data files are coming from my software which is all written in .Net
>> and when outputting in UTF-8 the default behaviour of .Net is to include
>> the BOM at the start of the file. The BOM is not required for UTF-8 but
>> it is not forbidden so I think this should be fixed (if possible) for
>> future releases. I will be modifying my software so that output of the
>> BOM can be disabled by my users if desired
>>
>> Looking at the error message given I expect that the same problem would
>> also affect N3 files since they are using the same reader afaict from
>> the error trace.
>>
>> Regards,
>>
>> Rob Vesse
>>
>> --
>> PhD Student
>> IAM Group
>> Bay 20, Room 4027, Building 32
>> Electronics&  Computer Science
>> University of Southampton
>>

-- 
PhD Student
IAM Group
Bay 20, Room 4027, Building 32
Electronics & Computer Science
University of Southampton

Mime
View raw message