hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stuh...@mailtrust.com>
Subject RE: Maximum number of children
Date Tue, 13 Jan 2009 01:39:25 GMT
To continue with your current design, you could create a trie based on shared hash prefixes.

/root/template/date/ 1a5e67/2b45dc
/root/template/date/ 1a5e67/3d4a1f
/root/template/date/ 3d4a1f/1a5e67
/root/template/date/ 3d4a1f/2b45dc

Alternatively, you could use what the maildir mail storage format uses:
/root/template/date/ eh/eharmony.com/jo/joshuatuberville

Just check with the second one that all of the characters you support in email addresses are
supported in znode names.


-----Original Message-----
From: "Joshua Tuberville" <JoshuaTuberville@eharmony.com>
Sent: Monday, January 12, 2009 7:53pm
To: "'zookeeper-user@hadoop.apache.org'" <zookeeper-user@hadoop.apache.org>
Subject: Maximum number of children


We are attempting to use ZooKeeper to coordinate daily email thresholds.  To do this we created
a node hierarchy of


The idea being that we only send the template to an email address once per day.  This is intended
to support millions of email hashes per day. From the ZooKeeper perspective we just attempt
a create and if it succeeds we proceed and if we get a node exists exception we stop processing.
 This seems to operate fine for over 2 million email hashes so far in testing.  However we
also want to prune all previous days nodes to conserve memory.  We have run into a hard limit
while using the getChildren method for a given /root/template/date.  If the List of children
exceeds the hardcoded 4,194,304 byte limit ClientCnxn$SendThread.readLength() throws an exception
on line 490.  So we have an issue that we can not delete a node that has children nor is it
possible to delete a node who has children whose total names exceed 4 Mb.  

Any feedback or guidance is appreciated.

Joshua Tuberville

View raw message