Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05E25104EA for ; Fri, 19 Jul 2013 06:08:58 +0000 (UTC) Received: (qmail 28823 invoked by uid 500); 19 Jul 2013 06:08:57 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 28732 invoked by uid 500); 19 Jul 2013 06:08:54 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 28669 invoked by uid 99); 19 Jul 2013 06:08:48 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Jul 2013 06:08:48 +0000 Date: Fri, 19 Jul 2013 06:08:48 +0000 (UTC) From: "Basit Mustafa (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-1585) Provide option for FQDN/verbatim data from config files of servers to be stored in ZooKeeper rather than resolved IP MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713364#comment-13713364 ] Basit Mustafa commented on ACCUMULO-1585: ----------------------------------------- A very perfunctory/basic examination of what's going on at startup has me believing that implementing a basic (non-config flag based optional/conditional behavior) fix would go something like this: In org.apache.accumulo.server.tabletserver.TabletServer: 1) Add ivar to store the hostname String exactly as passed to the config(String hostname) method (from looking at the output of this method's first log statement, it appears it not yet resolved, but as typed in config, this is a good thing. 2) From here, a few possible paths are possible: A. One COULD just say let's modify getClientAddressString() to not return a resolved address. That is assuming this method's contract does not guarantee an IP:PORT String and that all callers are safe using an FQDN or whatever the config file had verbatim. The documentation/comment does not have a specific contract, but the lack of strong typing of the return value to an IP:PORT type (e.g. INetSocketAddress or something) makes me hopeful this would work (although could see this blowing up in all kinds of ways, too, if this String return value is expected to be IP:PORT by callers to getClientAddressString()). B. If this doesn't work or we know we don't want to go off changing the nature of this method because it'd violate its unwritten contract/caller expectation that it return IP:PORT, we could go off and say that we'll only write FQDN/hostname as passed verbatim into config(String hostname) (now stored in ivar from #1) in ZooKeeper and keep all Accumulo internals as-is (this works, IMHO since the internals past this point are all in the same JVM as long as we write FQDNs to ZK and we won't have the aforementioned schizophrenia because resolution in the same JVM should be the same barring DNS roundrobining/load balancing [uh, just don't do this between nodes in an Accumulo cluster :)]). Then, we're on the hook to go discover where /accumulo//tservers/XXXXXXXXX are read on the client and ensure that that read does the resolution of the retrieved FQDN/string, or at least just runs it through AddressUtils.toString(). Obviously, B involves the least changes to Accumulo code as it seems pretty straightforward since reads/writes to ZK are pretty obvious/unified in a single set of classes. A is making some large assumptions/leaps about the safety of changing the format of that String output, I'd feel better about it knowing what the author of it (and its callers) intended. I haven't done a "who calls this" analysis to see, I guess I could smoke test it, too, of course. But, B just seems like the path of much less resistance assuming we're only reading the value from ZK in one/a few places. Thoughts? Opinions? Anyone have any experience/know the code better than me to help shed light on assumptions or come up with C/D/E/F options that would be better? Thanks! > Provide option for FQDN/verbatim data from config files of servers to be stored in ZooKeeper rather than resolved IP > -------------------------------------------------------------------------------------------------------------------- > > Key: ACCUMULO-1585 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1585 > Project: Accumulo > Issue Type: Improvement > Components: tserver > Environment: All > Reporter: Basit Mustafa > Assignee: Eric Newton > Priority: Minor > Fix For: 1.6.0 > > Original Estimate: 12h > Remaining Estimate: 12h > > There are some situations (esp in virtualized/cloud environments) where "hardwiring" the IP into ZooKeeper can create reachability issues and an FQDN (or, better/also, the verbatim string/line from the concerned config file) would fix this problem. > For example, hostname node1.company.com specified in configuration files resolves to an Amazon EC2 *internal* IP of 10.2.3.4 (internal on virtualized network). Externally (e.g. from your dev machine, your offsite/non-VPN/non-VPCed data center, other client machines on different networks/clouds), node1.company.com will resolve to a public IP (e.g. Amazon Elastic IP, etc) of something more routeable, like 54.55.56.57. > Accumulo currently stores 10.2.3.4 in ZooKeeper based on this resolution, but, if you try to connect to Accumulo from outside these machines/machines in the same cloud/vitualized network/non routeable network, and the same FQDN (node1.company.com) resolves to the public address now (54.55.56.57), you will not be able to connect, because the Accumulo client will have pulled the resolved, and from here, unreachable, IP of 10.2.3.4. > Using the FQDN (or in some other way allowing for client-side name resolution/address translation, although this seems kludgy) would fix this issue in a relatively standard way. Ideally, this would not incur a performance issue beyond the first resolution assuming the TCP/IP stack is doing its job and caching stuff effectively (I assume). > This doesn't really hurt/break things if you give an option in some config, and, really, taking the literal from the file allows you to use whatever you want, the ultimate in flexibility. > See discussion http://mail-archives.apache.org/mod_mbox/accumulo-user/201307.mbox/%3CCAGFNOZTMVz0R2e0meDj%3DKqPPPJP6f5baaMqh8%3D07V7NZ8vToJg%40mail.gmail.com%3E for more details and others having the same issue. > I will look into creating a patch for this as soon as I have some time to find/look at relevant code portions (I need to find where accumulo is making these writes to ZK and if the read FQDNs would need any resolution/their use further down the line expects strictly IP or is in host or IP safe API calls, etc). Any suggestions on where I can begin this are always appreciated. Otherwise, I'll try and submit a patch when I can. > Figure I'd open this issue to at least provide a discussion on what more experienced Accumulo devs and users think and what a solution based on the style/patterns accepted for Accumulo development/configuration are. I can read the guidelines myself, of course, and will, but someone suggested opening an issue, so I am... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira