Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26CF36DAC for ; Tue, 28 Jun 2011 11:24:22 +0000 (UTC) Received: (qmail 34384 invoked by uid 500); 28 Jun 2011 11:24:22 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 34324 invoked by uid 500); 28 Jun 2011 11:24:21 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 34290 invoked by uid 99); 28 Jun 2011 11:24:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 11:24:20 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of castagna.lists@googlemail.com designates 74.125.82.175 as permitted sender) Received: from [74.125.82.175] (HELO mail-wy0-f175.google.com) (74.125.82.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 11:24:14 +0000 Received: by wyg30 with SMTP id 30so83336wyg.6 for ; Tue, 28 Jun 2011 04:23:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=jwWcSI3QasMIrDdJ0IYbCBREFbncUA1dWp4O8GEPHPY=; b=JGfPtP5Cc2TLYf8K/diUj4lmWbSGA3eY2fqKJQxMPwdsQh8xICNoi+kuausv7vbOU/ rFwEGcpij0Nu1ik6+7M5zdfYWBlVkcDTuttLOXhoY9WJadV4dKIYPiEl9W84213jc4hB f0oUMt+eUtmZhZUlFvDII3fFgy6/1CAS0Iitw= Received: by 10.227.164.79 with SMTP id d15mr6662045wby.62.1309260234202; Tue, 28 Jun 2011 04:23:54 -0700 (PDT) Received: from [192.168.1.10] (79-66-214-45.dynamic.dsl.as9105.com [79.66.214.45]) by mx.google.com with ESMTPS id o19sm78463wbh.38.2011.06.28.04.23.52 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 28 Jun 2011 04:23:53 -0700 (PDT) Message-ID: <4E09B9C6.3020001@googlemail.com> Date: Tue, 28 Jun 2011 12:23:50 +0100 From: Paolo Castagna User-Agent: Thunderbird 2.0.0.24 (X11/20101027) MIME-Version: 1.0 To: jena-dev@incubator.apache.org Subject: Re: BulkLoader error with large data and fast harddrive References: <4DFF9FB8.9080006@epimorphics.com> <4DFFB434.2050405@epimorphics.com> <4E0051E7.1000300@epimorphics.com> <4E09B838.7090004@epimorphics.com> In-Reply-To: <4E09B838.7090004@epimorphics.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Andy, interesting and thanks for sharing info. It would be interesting to know load performances using tdbloader|tdbloader2 with your new SSD. Paolo Andy Seaborne wrote: > Hi there, > > I now have an SSD (256G from Crucial) :-) > > /dev/sdb1 on /mnt/ssd1 type ext4 (rw,noatime) > > and I ran the test program on jamendo-rdf and on > mappingbased_properties_en.nt, then on jamendo-rdf with existing data as > in the test case. > > Everything works for me - the loads complete without an exception. > > Andy > > On 21/06/11 09:10, Andy Seaborne wrote: >> >> >> On 21/06/11 06:01, jp wrote: >>> Hey Andy >>> >>> I wasn't able to unzip the file >> >>> http://people.apache.org/~andy/jamendo.nt.gz however I ran it on my >>> dataset and I received an out of memory exception. I then changed line >>> 42 to true and received the original error. You can download the data >>> file I have been testing with from >>> http://www.kosmyna.com/mappingbased_properties_en.nt.bz2 unzipped it's >>> 2.6gb. This file has consistently failed to load. >> >> downloads.dbpedia.org is back - I download that file and loaded it with >> the test program - no problems. >> >>> While trying other datasets and variations of the simple program I had >>> what seemed to be a successful BulkLoad however when I opened the >>> dataset and tried to query it there were no results. I don't have the >>> exact details of this run but can try to reproduce it if you think it >>> would be useful. >> >> Yes please. At this point, any details a help >> >> Also, a complete log of the failed load of >> mappingbased_properties_en.nt.bz2 would be useful. >> >> Having looked at the stacktraces, and aligned them to the source code, >> it appears the code passes an internal consistency check, then fails on >> something that the test tests for. >> >> Andy >> >>> >>> -jp >>> >>> >>> On Mon, Jun 20, 2011 at 4:57 PM, Andy Seaborne >>> wrote: >>>> Fixed - sorry about that. >>>> >>>> Andy >>>> >>>> On 20/06/11 21:50, jp wrote: >>>>> >>>>> Hey andy, >>>>> >>>>> I assume the file you want me to run is >>>>> http://people.apache.org/~andy/ReportLoadOnSSD.java >>>>> >>>>> When I try to download it I get a permissions error. Let me know when >>>>> I should try again. >>>>> >>>>> -jp >>>>> >>>>> On Mon, Jun 20, 2011 at 3:30 PM, Andy Seaborne >>>>> wrote: >>>>>> >>>>>> Hi there, >>>>>> >>>>>> I tried to recreate this but couldn't, but I don't have an SSD to >>>>>> hand at >>>>>> the moment (being fixed :-) >>>>>> >>>>>> I've put my test program and the data from the jamendo-rdf you >>>>>> sent me >>>>>> in: >>>>>> >>>>>> http://people.apache.org/~andy/ >>>>>> >>>>>> so we can agree on exactly a test case. This code is single threaded. >>>>>> >>>>>> The conversion from .rdf to .nt wasn't pure. >>>>>> >>>>>> I tried running using the in-memory store as well. >>>>>> downloads.dbpedia.org was down atthe weekend - I'll try to get the >>>>>> same >>>>>> dbpedia data. >>>>>> >>>>>> Could you run exactly what I was running? The file name needs >>>>>> changing. >>>>>> >>>>>> You can also try uncommenting >>>>>> SystemTDB.setFileMode(FileMode.direct) ; >>>>>> and run it using non-mapped files in about 1.2 G of heap. >>>>>> >>>>>> Looking through the stacktarce, there is a point where the code has >>>>>> passed >>>>>> an internal consistence test then fails with something that should be >>>>>> caught >>>>>> by that test - and the code is sync'ed or single threaded. This >>>>>> is, to >>>>>> put >>>>>> it mildly, worrying. >>>>>> >>>>>> Andy >>>>>> >>>>>> On 18/06/11 16:38, jp wrote: >>>>>>> >>>>>>> Hey Andy, >>>>>>> >>>>>>> My entire program is run on one jvm as follows. >>>>>>> >>>>>>> public static void main(String[] args) throws IOException{ >>>>>>> DatasetGraphTDB datasetGraph = >>>>>>> TDBFactory.createDatasetGraph(tdbDir); >>>>>>> >>>>>>> /* I saw the BulkLoader had two ways of loading data based on >>>>>>> whether >>>>>>> the dataset existed already. I did two runs one with the following >>>>>>> two >>>>>>> lines commented out to test both ways the BulkLoader runs. Hopefully >>>>>>> this had the desired effect. */ >>>>>>> datasetGraph.getDefaultGraph().add(new >>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(), >>>>>>> Node.createURI("urn:house"))); >>>>>>> datasetGraph.sync(); >>>>>>> >>>>>>> InputStream inputStream = new FileInputStream(dbpediaData); >>>>>>> >>>>>>> BulkLoader bulkLoader = new BulkLoader(); >>>>>>> bulkLoader.loadDataset(datasetGraph, inputStream, true); >>>>>>> } >>>>>>> >>>>>>> The data can be found here >>>>>>> http://downloads.dbpedia.org/3.6/en/mappingbased_properties_en.nt.bz2 >>>>>>> >>>>>>> I appended the ontology to end of file it can be found here >>>>>>> http://downloads.dbpedia.org/3.6/dbpedia_3.6.owl.bz2 >>>>>>> >>>>>>> The tdbDir is an empty directory. >>>>>>> On my system the error starts occurring after about 2-3minutes and >>>>>>> 8-12 million triples loaded. >>>>>>> >>>>>>> Thanks for looking over this and please let me know if I can be of >>>>>>> further assistance. >>>>>>> >>>>>>> -jp >>>>>>> jp@nimblegraph.com >>>>>>> >>>>>>> >>>>>>> On Jun 17, 2011 9:29 am, andy wrote: >>>>>>>> >>>>>>>> jp, >>>>>>>> >>>>>>>> How does this fit with running: >>>>>>>> >>>>>>>> datasetGraph.getDefaultGraph().add(new >>>>>>>> Triple(Node.createURI("urn:hello"), RDF.type.asNode(), >>>>>>>> Node.createURI("urn:house"))); >>>>>>>> datasetGraph.sync(); >>>>>>>> >>>>>>>> Is the preload of one triple a separate JVM or the same JVM as the >>>>>>>> BulkLoader call - could you provide a single complete minimal >>>>>>>> example? >>>>>>>> >>>>>>>> In attempting to reconstruct this, I don't want to hide the >>>>>>>> problem by >>>>>>>> guessing how things are wired together. >>>>>>>> >>>>>>>> Also - exactly which dbpedia file are you loading (URL?) although I >>>>>>>> doubt the exact data is the cause here. >>>>>> >>>>