From users-return-5139-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Fri Oct 05 08:52:32 2007 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 98797 invoked from network); 5 Oct 2007 08:52:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Oct 2007 08:52:27 -0000 Received: (qmail 73002 invoked by uid 500); 5 Oct 2007 08:46:03 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 72570 invoked by uid 500); 5 Oct 2007 08:46:02 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 71895 invoked by uid 99); 5 Oct 2007 08:45:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2007 01:45:58 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of 1111software@gmail.com designates 209.85.198.188 as permitted sender) Received: from [209.85.198.188] (HELO rv-out-0910.google.com) (209.85.198.188) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2007 08:28:44 +0000 Received: by rv-out-0910.google.com with SMTP id k20so41553rvb for ; Fri, 05 Oct 2007 01:28:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=nr+iZe7kymgOXjvKI0odLIOZjAUp801h1EM70SdM8OE=; b=o9v3QY69SLdIb3HqCcnzWzNnaEZDxhDMDW3XmRxeLqqOlR0e4AA4kpRPxwcLDaDyU+CuWyqPWwUjtyT0S/9XKB5aG+tSfNuSO+ZUPQ515+zRrhzmskiI2ufqMECazqiwDIPgrJ3Eru7BPuUpaMz7F6U9NuidRxu8Dwl1GUmzwbU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=hnM1qp/h+5pOx2yUlOjixScPskEqmba8vVqI0c4D/6Jjpjasrpedx4LQt36NeNVEu7TAN/LB9oO223J8IV4+a2zQw/unJQN8Rx7QUStOI+Xw8CKiWkzBDlCKCchbxdWSm6N6aM0xXI11hyNPv4n27EGa9uigbs+95GxrUs/m7TA= Received: by 10.141.34.12 with SMTP id m12mr1476397rvj.1191572894433; Fri, 05 Oct 2007 01:28:14 -0700 (PDT) Received: by 10.141.210.19 with HTTP; Fri, 5 Oct 2007 01:28:14 -0700 (PDT) Message-ID: Date: Fri, 5 Oct 2007 10:28:14 +0200 From: "Jacco van Weert" <1111software@gmail.com> To: users@jackrabbit.apache.org Subject: Re: Memory usage issues of importml/exportsysview In-Reply-To: <035a01c80723$020a2c10$061e8430$@co.uk> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_18066_20557813.1191572894419" References: <035a01c80723$020a2c10$061e8430$@co.uk> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_18066_20557813.1191572894419 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hello Shaun, We use our own created backup facilty also works "hot". I wrote a mail about it a few days ago ( it's part of JeCARS ). The result of the backup is a; - CND file - node structure file in plain ASCII, easy parseable - the binary information is stored as seperate files. The solution works very well. I use it in an other application in which the repository is replicated at short intervals. It is especially usefull when existing nodetypes are changed.... in the future we will introduce a sort of "evolution scheme". When e.g. propertynames are changed the "restore" operation can map the property again. The source (of the first version) is available. Greetings, Jacco van Weert On 10/5/07, sbarriba wrote: > > Hi all, > > During a recent thread Hot Backup Tools were discussed - see > http://www.mail-archive.com/users@jackrabbit.apache.org/msg04255.html. > > > > As an outcome of that we're doing 2 things: > > 1) "Low-level" backup > > o Backing up the database > > o Backing up the repository file system > > 2) "High-level" backup > > o Running exportsysview on each workspace > > > > When migrating between environments or restoring backups solution 2) is > very > useful although the XML files are getting very large where the content has > lots of binaries etc. The main issue is that the memory requirements of > "importxml" increase linearly with the size of the XML file. I presume > this > is due to either a) the memory required to parse the file, and/or b) the > memory required to hold the transient state of the import. > > > > We're now needing to use a 1GB heap size for some imports and obviously > this > will hit a crunch point. > > > > Any suggestions on how to resolve this memory issue? For example, could > the > "importxml" not use a SAX event model to avoid parsing the XML into a > complete DOM etc (note I don't know the internals of importxml as it > stands). > > > > All suggestions welcome. > > Regards, > > Shaun > > > > -- ------------------------------------- Jacco van Weert -- 1111software@gmail.com JCR Controller -- http://www.xs4all.nl/~weertj/jcr ------=_Part_18066_20557813.1191572894419--