From users-return-10103-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Thu Jan 08 09:16:19 2009 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 46627 invoked from network); 8 Jan 2009 09:16:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Jan 2009 09:16:19 -0000 Received: (qmail 44579 invoked by uid 500); 8 Jan 2009 09:16:14 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 44566 invoked by uid 500); 8 Jan 2009 09:16:14 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 44545 invoked by uid 99); 8 Jan 2009 09:16:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jan 2009 01:16:14 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tmueller@day.com designates 207.126.148.182 as permitted sender) Received: from [207.126.148.182] (HELO eu3sys201aog002.obsmtp.com) (207.126.148.182) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 08 Jan 2009 09:16:06 +0000 Received: from source ([209.85.218.21]) by eu3sys201aob002.postini.com ([207.126.154.11]) with SMTP ID DSNKSWXEP84RHFN7G2kf3lvNboWYs8haNMGa@postini.com; Thu, 08 Jan 2009 09:15:46 UTC Received: by bwz14 with SMTP id 14so22533284bwz.15 for ; Thu, 08 Jan 2009 01:15:43 -0800 (PST) Received: by 10.181.208.11 with SMTP id k11mr3722915bkq.19.1231406143358; Thu, 08 Jan 2009 01:15:43 -0800 (PST) Received: by 10.181.28.11 with HTTP; Thu, 8 Jan 2009 01:15:43 -0800 (PST) Message-ID: <91f3b2650901080115s5b4a21c5s6393ec3cb5457ae7@mail.gmail.com> Date: Thu, 8 Jan 2009 10:15:43 +0100 From: "=?ISO-8859-1?Q?Thomas_M=FCller?=" To: users@jackrabbit.apache.org Subject: Re: Efficient but simple hash for storing lot of unidentifiable contents In-Reply-To: <4964AD55.8030705@anyware-tech.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_153046_1091182.1231406143360" References: <4964AD55.8030705@anyware-tech.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_153046_1091182.1231406143360 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, What about: 1) Start with the node of your choice (let's say /data/) 2) Check how many nodes and child nodes are in that node 3) If more than 50: pick a child node randomly from the range 00-50 and continue at step 2) 4) Create a node with a random name, done I would make sure there are no same name siblings (for multiple reasons). There are at least two solutions: A) Disallow same name siblings, use node names n00-n50, and let the algorithm re-try if there is a clash. I'm not sure if that works well with clustering. B) Use a cryptographically secure pseudo random number generator (for example a UUID) as the node name. This works well with clustering, the node names will be quite long however. Disadvantage: - Maybe performance. To improve that, you could keep the information 'root is full' in memory and start with a random node /data/00 - /data/50 directly Advantages: - No counter required - No synchronization required - Holes are automatically filled Regards, Thomas ------=_Part_153046_1091182.1231406143360--