Return-Path: X-Original-To: apmail-incubator-any23-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-any23-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44162DA2C for ; Wed, 8 Aug 2012 09:37:57 +0000 (UTC) Received: (qmail 78555 invoked by uid 500); 8 Aug 2012 09:37:57 -0000 Delivered-To: apmail-incubator-any23-dev-archive@incubator.apache.org Received: (qmail 78508 invoked by uid 500); 8 Aug 2012 09:37:56 -0000 Mailing-List: contact any23-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: any23-dev@incubator.apache.org Delivered-To: mailing list any23-dev@incubator.apache.org Delivered-To: moderator for any23-dev@incubator.apache.org Received: (qmail 67996 invoked by uid 99); 8 Aug 2012 09:33:51 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Am4FACYxIlAKhJ0L/2dsb2JhbABFhRO0VoEHgiABAQQBOj8QGAscBwtXBogaBgu7FIsPhgBgA5VIgRSEaYoPgmCBVyM X-IronPort-AV: E=Sophos;i="4.77,732,1336345200"; d="scan'208";a="211625695" Subject: N-Quads in Re: Upgrade to Tika 1.2 [WAS] Re: [ANNOUNCE] Welcome Peter Ansell as Any23 PPMC member and committer Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Richard Cyganiak In-Reply-To: Date: Wed, 8 Aug 2012 10:33:23 +0100 Cc: any23-dev@incubator.apache.org Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Michele Mostarda X-Mailer: Apple Mail (2.1084) X-OriginalArrivalTime: 08 Aug 2012 09:33:23.0931 (UTC) FILETIME=[DB33C2B0:01CD7548] Hi Michele, On 8 Aug 2012, at 10:12, Michele Mostarda wrote: > the only thing I would stress is to avoid breaking the support > for IRI in N-Quads[0] present in the current Any23 version of the = parser.=20 >=20 > I know it is not compliant with the N-Quads standard but we introduced = such feature=20 > because Sindice[1] (which uses Any23 to distill RDF content from = collected pages)=20 > is constantly crawling a lot of N-Quads documents written with IRI = encoding. I'm not sure what you mean when you say that the IRI support in Any23 = isn't compliant with the N-Quads standard. Can you elaborate? I'd say that N-Quads as defined in [0] supports IRIs. Best, Richard >=20 > What I suggest as general approach is to add flags to enforce = validation or just to produce > warnings when non standard data is detected instead than avoid = supporting non fully standard data at all. >=20 > I would also suggest the promotion for a standard upgrade to pass from = URI to IRI support for N-Quads. > Richard, any advice about this? >=20 > The best. > Mic >=20 > [0] http://sw.deri.org/2008/07/n-quads/ > [1] http://sindice.com/ >=20