incubator-any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Ansell <>
Subject Re: N-Quads in Re: Upgrade to Tika 1.2 [WAS] Re: [ANNOUNCE] Welcome Peter Ansell as Any23 PPMC member and committer
Date Wed, 08 Aug 2012 23:37:34 GMT
On 8 August 2012 19:33, Richard Cyganiak <> wrote:
> Hi Michele,
> On 8 Aug 2012, at 10:12, Michele Mostarda wrote:
>> the only thing I would stress is to avoid breaking the support
>> for IRI in N-Quads[0] present in the current Any23 version of the parser.
>> I know it is not compliant with the N-Quads standard but we introduced such feature
>> because Sindice[1] (which uses Any23 to distill RDF content from collected pages)
>> is constantly crawling a lot of N-Quads documents written with IRI encoding.
> I'm not sure what you mean when you say that the IRI support in Any23 isn't compliant
with the N-Quads standard. Can you elaborate?
> I'd say that N-Quads as defined in [0] supports IRIs.

The Any23 N-Quads parser currently supports UTF-8 IRIs that do not
have non-US-ASCII characters encoded using %XY, which is incompatible
with the encoding rules that N-Quads/N-Triples rely on from the
RDF-1.0 spec [2]. The parser also allows Literal strings to have UTF-8
characters that are not encoded using \u or \U as is specified in the
N-Triples spec at [3].




View raw message