Return-Path: X-Original-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-jena-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16E674A51 for ; Fri, 17 Jun 2011 16:08:51 +0000 (UTC) Received: (qmail 499 invoked by uid 500); 17 Jun 2011 16:08:51 -0000 Delivered-To: apmail-incubator-jena-dev-archive@incubator.apache.org Received: (qmail 477 invoked by uid 500); 17 Jun 2011 16:08:51 -0000 Mailing-List: contact jena-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jena-dev@incubator.apache.org Delivered-To: mailing list jena-dev@incubator.apache.org Received: (qmail 469 invoked by uid 99); 17 Jun 2011 16:08:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 16:08:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jplikesbikes@gmail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vx0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 16:08:43 +0000 Received: by vxa37 with SMTP id 37so569969vxa.6 for ; Fri, 17 Jun 2011 09:08:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=xhA97+l9IspW5QPe/dPC3qsESb8xaZaNkcf0Pi3S1mE=; b=A/yI3PiH28lNYkrmdV2T00WGVhjl9eREwDU+nhm/x2PhhLMdrjmgX+3lyLYsxtGeb2 sgoKXQyFf4XDdg+pMDdWYisA3hPv5lcyFfCmYPPkbE/41g3LORZ2xKZeXEcn0PWh5PlJ LJLPzLF9gxaokzOsu9G6R/S4IrkOeK65BI90s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=xEf91UOgSI6XF0r7/5V1TnqhiizvSExX2i32Qk3INUnhZSWtojV9AVc/juuwo8ha7q mmLmac2umqeZ1w6Wa6KR6qGF8EahNeAWpd3NtWlE6LA9DfFP1qYaACGu0bwDIQHLIweW bveUJvfWVZwOFe/DTLjPLm2fOYJ+ladij3CmQ= MIME-Version: 1.0 Received: by 10.52.113.100 with SMTP id ix4mr1714769vdb.223.1308326902685; Fri, 17 Jun 2011 09:08:22 -0700 (PDT) Sender: jplikesbikes@gmail.com Received: by 10.52.184.234 with HTTP; Fri, 17 Jun 2011 09:08:22 -0700 (PDT) In-Reply-To: References: Date: Fri, 17 Jun 2011 12:08:22 -0400 X-Google-Sender-Auth: qc8SVazBas0UJUOij6P0If5dKIc Message-ID: Subject: Re: BulkLoader error with large data and fast harddrive From: jp To: jena-dev@incubator.apache.org, Simon Helsen Content-Type: multipart/alternative; boundary=bcaec547c7cb48de6104a5ea989e X-Virus-Checked: Checked by ClamAV on apache.org --bcaec547c7cb48de6104a5ea989e Content-Type: text/plain; charset=UTF-8 Hey Simon The only code I am running is. DatasetGraphTDB datasetGraph = TDBFactory.createDatasetGraph(tdbDir); InputStream inputStream = new FileInputStream(dbpediaData); BulkLoader bulkLoader = new BulkLoader(); bulkLoader.loadDataset(dataset, instanceStream, true); No other processes or threads are running and the application has exclusive access to the tdb directory. Because of this I suspect a timing issue within TDB's code maybe somewhere in RecordBuffer or in the BPTree itself. I have noticed I can only reproduce the issue on fast harddrives such as a SSD harddrive. Thanks -jp On Fri, Jun 17, 2011 at 11:52 AM, Simon Helsen wrote: > > TBD is not thread-safe. You have to protect read and write operations > yourself (i.e. multiple read, but exclusive write, i.e. no read while write) > > Simon > > > *Simon Helsen, Ph.D.* > Advisory Software Engineer - Jazz Foundation Server > ------------------------------ > *Phone:* 1-416-225-5717 | *Mobile:* 1-647-966-8280* > E-mail:* *shelsen@ca.ibm.com* > [image: IBM] > > > > > From: jp To: jena-dev@incubator.apache.org Date: 06/17/2011 > 11:39 AM Subject: BulkLoader error with large data and fast harddrive > ------------------------------ > > > > I recently updated my computer hardware and am receiving exceptions > while loading a dbpedia dataset of ~19million triples. I have been > able to produce the error below using the follow code. I believe this > might be a concurrency issue as the same data loads with the same code > on a similar machine with a standard harddrive. > > DatasetGraphTDB datasetGraph = TDBFactory.createDatasetGraph(tdbDir); > InputStream inputStream = new FileInputStream(dbpediaData); > > BulkLoader bulkLoader = new BulkLoader(); > bulkLoader.loadDataset(dataset, instanceStream, true); > > > My current specs are > 2.3gh Quad core i5 processor > 4gb ram > 128gb ssd harddrive > > tested on both > java version "1.6.0_22" > OpenJDK Runtime Environment (IcedTea6 1.10.1) (6b22-1.10.1-0ubuntu1) > OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) > > java version "1.6.0_24" > Java(TM) SE Runtime Environment (build 1.6.0_24-b07) > Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) > > Jena versions are as follows > arq-2.8.8 > jena-2.6.4 > tdb-0.8.10 > > Error while loading into an empty directory > java.lang.IllegalArgumentException > at java.nio.Buffer.position(Buffer.java:235) > at > com.hp.hpl.jena.tdb.base.record.RecordFactory.buildFrom(RecordFactory.java:94) > at > com.hp.hpl.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:95) > at > com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:41) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeRecords.getSplitKey(BPTreeRecords.java:141) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.split(BPTreeNode.java:435) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:387) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167) > at > com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297) > at > com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289) > at > com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48) > at > com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49) > at > com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54) > at > com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77) > at > com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244) > at > org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60) > at org.openjena.riot.lang.LangBase.parse(LangBase.java:71) > at > org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117) > at > com.nimblegraph.data.bin.SimpleDatasetLoader.main(SimpleDatasetLoader.java:24) > > Error when loading into a directory with one triple. The following is > run before the bulk loader. > > datasetGraph.getDefaultGraph().add(new > Triple(Node.createURI("urn:hello"), RDF.type.asNode(), > Node.createURI("urn:house"))); > datasetGraph.sync(); > > java.lang.IllegalArgumentException: Out of bounds: idx=0, size=-866953722 > at > com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.checkBounds(RecordBuffer.java:228) > at > com.hp.hpl.jena.tdb.base.buffer.RecordBuffer.add(RecordBuffer.java:66) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeRecords.internalInsert(BPTreeRecords.java:112) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:399) > at > com.hp.hpl.jena.tdb.index.bplustree.BPTreeNode.insert(BPTreeNode.java:167) > at > com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.addAndReturnOld(BPlusTree.java:297) > at > com.hp.hpl.jena.tdb.index.bplustree.BPlusTree.add(BPlusTree.java:289) > at > com.hp.hpl.jena.tdb.index.TupleIndexRecord.performAdd(TupleIndexRecord.java:48) > at > com.hp.hpl.jena.tdb.index.TupleIndexBase.add(TupleIndexBase.java:49) > at > com.hp.hpl.jena.tdb.index.TupleTable.add(TupleTable.java:54) > at > com.hp.hpl.jena.tdb.nodetable.NodeTupleTableConcrete.addRow(NodeTupleTableConcrete.java:77) > at > com.hp.hpl.jena.tdb.store.bulkloader.LoaderNodeTupleTable.load(LoaderNodeTupleTable.java:112) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:268) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader$2.send(BulkLoader.java:244) > at > org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:60) > at org.openjena.riot.lang.LangBase.parse(LangBase.java:71) > at > org.openjena.riot.RiotReader.parseQuads(RiotReader.java:122) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:159) > at > com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:117) > at > com.nimblegraph.data.bin.SimpleDatasetLoader.main(SimpleDatasetLoader.java:24) > > Any help tracking down the issue would be greatly appreciated. > Thanks for the great software > > -jp > jp@nimblegraph.com > > > --bcaec547c7cb48de6104a5ea989e--