From users-return-17534-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Tue May 24 12:33:26 2011 Return-Path: X-Original-To: apmail-jackrabbit-users-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 360786032 for ; Tue, 24 May 2011 12:33:26 +0000 (UTC) Received: (qmail 64950 invoked by uid 500); 24 May 2011 12:33:25 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 64926 invoked by uid 500); 24 May 2011 12:33:25 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 64918 invoked by uid 99); 24 May 2011 12:33:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 12:33:25 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of langleyatwork@gmail.com designates 209.85.160.42 as permitted sender) Received: from [209.85.160.42] (HELO mail-pw0-f42.google.com) (209.85.160.42) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 12:33:17 +0000 Received: by pwj3 with SMTP id 3so5304657pwj.1 for ; Tue, 24 May 2011 05:32:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=C7p5FRGurAF/C1k7F/lZA4ibFwKTqfid8EMXOX1OZ4I=; b=tAFApni9nafeUKk57KwmXypkq3WK/2ILpjfcWDUqtBaCHplnTLevU/mEiOQvVRLSZ2 2DcJalqo+Y1+bVir7I7uc889er9WJyVYoyZP8KrlXmlAsDNyq+0Jr/JRrtVbNZMQU5wh gNQHF9IH3Jc2x31iCtr+xNsOk9ndswXy7uA4c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=LpLBJGhZPFfElG6fc++fx+D8pPk+lckIt1dyTj4hq3p71C2TCoseDnsakjSxYI+8wZ zHiwQDu7ih2edFBLoSpovQ+3Vj2gApu4K+3nqHK+kZJ25xnPxUdpSUvzvucs34H5UHZe k+j7D8hOIrh+Viniq/USjt55E56mI/lYLRF0U= MIME-Version: 1.0 Received: by 10.142.61.42 with SMTP id j42mr1153039wfa.100.1306240376212; Tue, 24 May 2011 05:32:56 -0700 (PDT) Received: by 10.142.191.15 with HTTP; Tue, 24 May 2011 05:32:56 -0700 (PDT) In-Reply-To: References: Date: Tue, 24 May 2011 08:32:56 -0400 Message-ID: Subject: Re: non-versioned "cruft" From: John Langley To: users@jackrabbit.apache.org Content-Type: multipart/alternative; boundary=001636e0ac569d8d7604a404c900 X-Virus-Checked: Checked by ClamAV on apache.org --001636e0ac569d8d7604a404c900 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thank you VERY MUCH. -- Langley 2011/5/24 Fabi=E1n Mandelbaum > Hello John. From time to time you have to run a process analog to > memory garbage collection, on the repository. I had the same question > a few weeks ago on this list. You can take a look here: > > http://wiki.apache.org/jackrabbit/DataStore#Data_Store_Garbage_Collection > > Good luck. > > On Mon, May 23, 2011 at 5:37 PM, John Langley > wrote: > > We are using a jackrabbit 2.2.5 installation to store both versioned an= d > > non-versioned files and our only production interface is via webdav. > > > > Over time we've noticed that when we migrate a repository using the > > RepositoryCopier tool that the size of the stored data drops > dramatically. > > In one instance it was 1/35th of the size of the un-migrated data set a= s > > measured by doing a mysql dump of the database. Part of our migration > > process is to run a comparison tool comparing the old and new > repositories. > > Using this interface we check every file including the versioned files = so > we > > know that we've had a successful copy. > > > > Consequently, our conclusion is that this "cruft" is comprised of > "orphaned" > > nodes that are associated with non-versioned content. Does this make > sense? > > If so, is there a suggested way to prune out this unused content? > > > > Thanks in advance, > > > > -- Langley > > > > > > -- > Fabi=E1n Mandelbaum > IS Engineer > --001636e0ac569d8d7604a404c900--