Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EEE4868CC for ; Tue, 28 Jun 2011 13:44:44 +0000 (UTC) Received: (qmail 22130 invoked by uid 500); 28 Jun 2011 13:44:44 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 22108 invoked by uid 500); 28 Jun 2011 13:44:44 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 22100 invoked by uid 99); 28 Jun 2011 13:44:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 13:44:44 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [83.222.232.117] (HELO echo.justhostme.co.uk) (83.222.232.117) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 13:44:35 +0000 Received: from cpc2-aztw23-2-0-cust840.aztw.cable.virginmedia.com ([94.171.235.73] helo=[192.168.1.10]) by echo.justhostme.co.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1QbYaH-0000wF-F4 for jena-dev@incubator.apache.org; Tue, 28 Jun 2011 14:44:09 +0100 Message-ID: <4E09DAA2.4070508@epimorphics.com> Date: Tue, 28 Jun 2011 14:44:02 +0100 From: Andy Seaborne User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Thunderbird/3.1.10 MIME-Version: 1.0 To: jena-dev@incubator.apache.org Subject: Re: Blank nodes and MapReduce References: <4E08C8A8.1090600@googlemail.com> <4E09D50E.2030307@epimorphics.com> <4E09D928.9030707@googlemail.com> In-Reply-To: <4E09D928.9030707@googlemail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - echo.justhostme.co.uk X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - epimorphics.com X-Virus-Checked: Checked by ClamAV on apache.org On 28/06/11 14:37, Paolo Castagna wrote: > > > Andy Seaborne wrote: >>> public Node create(String label) { >>> return Node.createAnon(new AnonId(filename + "-" + label)) ; >>> } >> >> The way I thought was to allocate a UUID per parser run (or any other >> sufficiently large random number), xor the label into the UUID to >> produce the bNode label. This is a non-localised label allocation scheme. > > Hi Andy, > I am not sure this would work with MapReduce as filers are split into multiple > chunks and different machines can process splits from the same file. Exactly - by "parser run" I mean all the separate parsing actions in one step of the process. Allocate one large job random number as the base of bNode label generation across the whole cluster. Per job instance, means it's different next time, important if the data is merged with other data. > > Let's say I have this file, split into two chunks: > > ---------------------------- > > _:bnode1 . split 1 > _:bnode1 "1" . > > ---------------------------- > > _:bnode1 "2" . split 2 > > ---------------------------- > > I need to ensure the 'bnode1' label in split 1 and 2 refers to the same blank > node even if the splits are parsed separately. However, the same 'bnode1' label > from a different file must represent a different blank node. In practice, with > MapReduce, I cannot assume that a file is parsed in a single "parser run". > >> >>> Therefore, I would like to have my own >>> LabelToNode implementation with an Allocator which >>> takes into >>> account the filename (or an hash of it) when it creates a new blank node. >>> But LabelToNode constructor is private. >>> >>> Could we make it protected? >> >> Now public. > > Thanks. > > Paolo > >> >>> >>> Or, alternatively, how can I construct a LabelToNode object which will >>> be using >>> my MapReduceAllocator? >> >> LabelToNode createUseLabelAsGiven() >> >> Andy