hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: Maximum number of children
Date Tue, 13 Jan 2009 18:48:55 GMT
Thanks Joshua. 

mahadev


On 1/13/09 10:43 AM, "Joshua Tuberville" <JoshuaTuberville@eharmony.com>
wrote:

> Thanks to everyone for proposed schemes and I created ZOOKEEPER-272 per your
> request Mahadev.
> 
> Joshua
> 
> 
> -----Original Message-----
> From: Mahadev Konar [mailto:mahadev@yahoo-inc.com]
> Sent: Monday, January 12, 2009 7:04 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Maximum number of children
> 
> I was going to suggest bucketing with predifined hashes.
> /root/template/data/hashbucket/hash
> 
> For the issue raised by Joshua regarding the length of the output from the
> server -- 
> This is a bug. We seem to allow any number of children (< int) of a node and
> the getchildren call fails to return the children. This leads to a chicken
> and egg problem on how to get rid of the nodes if you do not know them.
> 
> Here we arent saving nething since the server has already processed the
> request and sent us the data. We should get rid of this hard coded limit. I
> am not sure why we had this limit.
> 
> Can you open a jira for this Joshua?
> 
> thanks
> mahadev
> 
> 
> On 1/12/09 5:39 PM, "Stu Hood" <stuhood@mailtrust.com> wrote:
> 
>> To continue with your current design, you could create a trie based on shared
>> hash prefixes.
>> 
>> /root/template/date/ 1a5e67/2b45dc
>> /root/template/date/ 1a5e67/3d4a1f
>> /root/template/date/ 3d4a1f/1a5e67
>> /root/template/date/ 3d4a1f/2b45dc
>> 
>> Alternatively, you could use what the maildir mail storage format uses:
>> /root/template/date/ eh/eharmony.com/jo/joshuatuberville
>> 
>> Just check with the second one that all of the characters you support in
>> email
>> addresses are supported in znode names.
>> 
>> Thanks,
>> Stu
>> 
>> 
>> -----Original Message-----
>> From: "Joshua Tuberville" <JoshuaTuberville@eharmony.com>
>> Sent: Monday, January 12, 2009 7:53pm
>> To: "'zookeeper-user@hadoop.apache.org'" <zookeeper-user@hadoop.apache.org>
>> Subject: Maximum number of children
>> 
>> Hello,
>> 
>> We are attempting to use ZooKeeper to coordinate daily email thresholds.  To
>> do this we created a node hierarchy of
>> 
>> /root/template/date/email_hash
>> 
>> The idea being that we only send the template to an email address once per
>> day.  This is intended to support millions of email hashes per day. From the
>> ZooKeeper perspective we just attempt a create and if it succeeds we proceed
>> and if we get a node exists exception we stop processing.  This seems to
>> operate fine for over 2 million email hashes so far in testing.  However we
>> also want to prune all previous days nodes to conserve memory.  We have run
>> into a hard limit while using the getChildren method for a given
>> /root/template/date.  If the List of children exceeds the hardcoded 4,194,304
>> byte limit ClientCnxn$SendThread.readLength() throws an exception on line
>> 490.
>> So we have an issue that we can not delete a node that has children nor is it
>> possible to delete a node who has children whose total names exceed 4 Mb.
>> 
>> Any feedback or guidance is appreciated.
>> 
>> Joshua Tuberville
>> 
>> 
> 


Mime
View raw message