Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org
From: Shawn Heisey <apache@elyograg.org>
Subject: Prevent a znode from exceeding jute.maxbuffer
Message-ID: <560DCB9E.4090307@elyograg.org>
Date: Thu, 1 Oct 2015 18:11:10 -0600
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.2.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

I was going to open an issue in Jira for this, but I figured I should
discuss it here before I do that, to make sure that's a reasonable
course of action.

I was thinking about a problem that we encounter with SolrCloud, where
our overseer queue (stored in zookeeper) will greatly exceed the default
jute.maxbuffer size.  I encountered this personally while researching
something for a Solr issue:

https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14347834

It seems silly that a znode could get to 14 times the allowed size
without notifying the code *inserting* the data.  The structure of our
queue is such that entries in the queue are children of the znode.  This
means that the data stored directly in the znode is not the problem
(which is pretty much nonexistent in this case), it's the number of
children.

It seems like it would be a good idea to reject the creation of new
children if that would cause the znode size to exceed jute.maxbuffer. 
This moves the required error handling to the code that *updates* ZK,
rather than the code that is watching and/or reading ZK, which seems
more appropriate to me.

Alternately, the mechanisms involved could be changed so that the client
can handle accessing a znode with millions of children, without
complaining about the packet length.

Thoughts?

Thanks,
Shawn