incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Ansell <ansell.pe...@gmail.com>
Subject Re: N-Quads in Re: Upgrade to Tika 1.2 [WAS] Re: [ANNOUNCE] Welcome Peter Ansell as Any23 PPMC member and committer
Date Wed, 08 Aug 2012 23:37:34 GMT
On 8 August 2012 19:33, Richard Cyganiak <richard.cyganiak@deri.org> wrote:
> Hi Michele,
>
> On 8 Aug 2012, at 10:12, Michele Mostarda wrote:
>> the only thing I would stress is to avoid breaking the support
>> for IRI in N-Quads[0] present in the current Any23 version of the parser.
>>
>> I know it is not compliant with the N-Quads standard but we introduced such feature
>> because Sindice[1] (which uses Any23 to distill RDF content from collected pages)
>> is constantly crawling a lot of N-Quads documents written with IRI encoding.
>
> I'm not sure what you mean when you say that the IRI support in Any23 isn't compliant
with the N-Quads standard. Can you elaborate?
>
> I'd say that N-Quads as defined in [0] supports IRIs.

The Any23 N-Quads parser currently supports UTF-8 IRIs that do not
have non-US-ASCII characters encoded using %XY, which is incompatible
with the encoding rules that N-Quads/N-Triples rely on from the
RDF-1.0 spec [2]. The parser also allows Literal strings to have UTF-8
characters that are not encoded using \u or \U as is specified in the
N-Triples spec at [3].

Cheers,

Peter

[2] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-URI-reference
[3] http://www.w3.org/TR/rdf-testcases/#ntrip_strings

Mime
View raw message