From oak-dev-return-3582-apmail-jackrabbit-oak-dev-archive=jackrabbit.apache.org@jackrabbit.apache.org Tue Feb 26 10:04:47 2013 Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC5B0EC63 for ; Tue, 26 Feb 2013 10:04:47 +0000 (UTC) Received: (qmail 19637 invoked by uid 500); 26 Feb 2013 10:04:47 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 19598 invoked by uid 500); 26 Feb 2013 10:04:47 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 19583 invoked by uid 99); 26 Feb 2013 10:04:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 10:04:46 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mueller@adobe.com designates 64.18.1.208 as permitted sender) Received: from [64.18.1.208] (HELO exprod6og107.obsmtp.com) (64.18.1.208) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 10:04:38 +0000 Received: from outbound-smtp-2.corp.adobe.com ([193.104.215.16]) by exprod6ob107.postini.com ([64.18.5.12]) with SMTP ID DSNKUSyIobSpoDlg7Nta9aVADQoydlqBzMaA@postini.com; Tue, 26 Feb 2013 02:04:18 PST Received: from inner-relay-4.eur.adobe.com (inner-relay-4b [10.128.4.237]) by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id r1QA4GnT006174 for ; Tue, 26 Feb 2013 02:04:16 -0800 (PST) Received: from nacas01.corp.adobe.com (nacas01.corp.adobe.com [10.8.189.99]) by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id r1QA4EXL019406 for ; Tue, 26 Feb 2013 02:04:15 -0800 (PST) Received: from eurhub01.eur.adobe.com (10.128.4.30) by nacas01.corp.adobe.com (10.8.189.99) with Microsoft SMTP Server (TLS) id 8.3.298.1; Tue, 26 Feb 2013 02:04:14 -0800 Received: from eurmbx01.eur.adobe.com ([10.128.4.32]) by eurhub01.eur.adobe.com ([10.128.4.30]) with mapi; Tue, 26 Feb 2013 10:04:14 +0000 From: Thomas Mueller To: "oak-dev@jackrabbit.apache.org" Date: Tue, 26 Feb 2013 10:04:12 +0000 Subject: Re: Large flat commit problems Thread-Topic: Large flat commit problems Thread-Index: Ac4UCKFTgPY6YRAhSIeo4Fd15L6CtA== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.1.130117 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hi, Large transactions: I think we didn't define this as a strict requirement. I'm not aware we got into big troubles with Jackrabbit 2.x where this is not supported. For me, this is still a nice to have. But of course it's something we should test and try to achieve (and resolve problems if we find any). Flat hierarchies: Yes this is important (we ran into this problem many times). I didn't analyze the results, but could the problem be orderable child nodes? Currently, oak-core stores a property ":childOrder". If there are many child nodes, then this property gets larger and larger. This is a problem, as it consumes more and more disk space / network bandwidth / cpu, of the order n^2. It's the same problem as with storing the list of children in the node bundle. So I guess this needs to be solved in oak-core (not in each MK separately)? Regards, Thomas I combined these two goals into a simple benchmark >that tries to import the contents of a Wikipedia dump into an Oak >repository using just a single save() call. > >Here are some initial numbers using the fairly small Faroese >wikipedia, with just some 12k pages. > >The default H2 MK starts to slow down after 5k transient nodes and >fails after 6k: > >$ java -DOAK-652=3Dtrue -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \ > benchmark --wikipedia=3Dfowiki-20130213-pages-articles.xml \ > WikipediaImport Oak-Default >Apache Jackrabbit Oak 0.7-SNAPSHOT >Wikipedia import (fowiki-20130213-pages-articles.xml) >Oak-Default: importing Wikipedia... >Imported 1000 pages in 1 seconds (1271us/page) >Imported 2000 pages in 2 seconds (1465us/page) >Imported 3000 pages in 4 seconds (1475us/page) >Imported 4000 pages in 6 seconds (1749us/page) >Imported 5000 pages in 11 seconds (2219us/page) >Imported 6000 pages in 28 seconds (4815us/page) >Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > >The new MongoMK prototype fails already sooner: > >$ java -DOAK-652=3Dtrue -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \ > benchmark --wikipedia=3Dfowiki-20130213-pages-articles.xml \ > WikipediaImport Oak-Mongo >Apache Jackrabbit Oak 0.7-SNAPSHOT >Wikipedia import (fowiki-20130213-pages-articles.xml) >Oak-Mongo: importing Wikipedia... >Imported 1000 pages in 1 seconds (1949us/page) >Imported 2000 pages in 6 seconds (3260us/page) >Imported 3000 pages in 13 seconds (4523us/page) >Imported 4000 pages in 30 seconds (7613us/page) >Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > >After my recent work on OAK-632 the SegmentMK does better, but it also >experiences some slowdown over time: > >$ java -DOAK-652=3Dtrue -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \ > benchmark --wikipedia=3Dfowiki-20130213-pages-articles.xml \ > WikipediaImport Oak-Segment >Apache Jackrabbit Oak 0.7-SNAPSHOT >Wikipedia import (fowiki-20130213-pages-articles.xml) >Oak-Segment: importing Wikipedia... >Imported 1000 pages in 1 seconds (1419us/page) >Imported 2000 pages in 2 seconds (1447us/page) >Imported 3000 pages in 4 seconds (1492us/page) >Imported 4000 pages in 6 seconds (1586us/page) >Imported 5000 pages in 8 seconds (1697us/page) >Imported 6000 pages in 10 seconds (1812us/page) >Imported 7000 pages in 13 seconds (1927us/page) >Imported 8000 pages in 16 seconds (2042us/page) >Imported 9000 pages in 19 seconds (2146us/page) >Imported 10000 pages in 22 seconds (2254us/page) >Imported 11000 pages in 25 seconds (2355us/page) >Imported 12000 pages in 29 seconds (2462us/page) >Imported 12148 pages in 41 seconds (3375us/page) > >To summarize, all MKs still need some work on this. Once these initial >problems are solved, we can try the same benchmark with larger >Wikipedias. > >PS. Note that I'm using the OAK-652 feature flag to speed things up on >the oak-jcr level. > >BR, > >Jukka Zitting