Return-Path: Delivered-To: apmail-httpd-cvs-archive@www.apache.org Received: (qmail 37709 invoked from network); 27 Dec 2008 07:22:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Dec 2008 07:22:44 -0000 Received: (qmail 71048 invoked by uid 500); 27 Dec 2008 07:22:43 -0000 Delivered-To: apmail-httpd-cvs-archive@httpd.apache.org Received: (qmail 70992 invoked by uid 500); 27 Dec 2008 07:22:43 -0000 Mailing-List: contact cvs-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list cvs@httpd.apache.org Received: (qmail 70981 invoked by uid 99); 27 Dec 2008 07:22:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Dec 2008 23:22:43 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Dec 2008 07:22:41 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id A72BB23888EB; Fri, 26 Dec 2008 23:22:20 -0800 (PST) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r729612 - /httpd/mod_mbox/trunk/scripts/site-sitemap.py Date: Sat, 27 Dec 2008 07:22:20 -0000 To: cvs@httpd.apache.org From: pquerna@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20081227072220.A72BB23888EB@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: pquerna Date: Fri Dec 26 23:22:20 2008 New Revision: 729612 URL: http://svn.apache.org/viewvc?rev=729612&view=rev Log: Generate partitioned sitemaps for mailing lists with over 100mb of messages. Modified: httpd/mod_mbox/trunk/scripts/site-sitemap.py Modified: httpd/mod_mbox/trunk/scripts/site-sitemap.py URL: http://svn.apache.org/viewvc/httpd/mod_mbox/trunk/scripts/site-sitemap.py?rev=729612&r1=729611&r2=729612&view=diff ============================================================================== --- httpd/mod_mbox/trunk/scripts/site-sitemap.py (original) +++ httpd/mod_mbox/trunk/scripts/site-sitemap.py Fri Dec 26 23:22:20 2008 @@ -1,10 +1,25 @@ #!/usr/local/bin/python import os +from os.path import join as pjoin import sys +import subprocess + +def get_output(cmd): + s = subprocess.Popen(cmd, stdout=subprocess.PIPE) + out = s.communicate()[0] + s.wait() + return out.strip() + +# you could use os.path.walk to calculate this... or you could use du(1). +def duhack(path): + cmd = ['du', '-k', path] + out = get_output(cmd).split() + return int(out[0]) * 1024 ROOT="/x1/mail-archives/mod_mbox" HOSTNAME="http://mail-archives.apache.org/mod_mbox/" +PARITION_SIZE=100 * 1024 * 1024 tlps={} for files in os.listdir(ROOT): path = files @@ -17,7 +32,7 @@ tlp = "asf" if not tlps.has_key(tlp): tlps[tlp] = {} - tlps[tlp][list] = path + tlps[tlp][list] = [path, duhack(pjoin(ROOT, path))] keys = tlps.keys() keys.sort() @@ -36,7 +51,14 @@ klist = tlps[tlp].keys() klist.sort() for list in klist: - print " %s%s/?format=sitemap" % (HOSTNAME, tlps[tlp][list]) + name = tlps[tlp][list][0] + size = tlps[tlp][list][1] + if (size > PARITION_SIZE): + print " %s%s/?format=sitemap" % (HOSTNAME, name) + else: + part = int(size / PARITION_SIZE) + 1 + for i in range(0, part): + print " %s%s/?format=sitemap&pmax=%d&part=%d" % (HOSTNAME, name, part, i) print """