Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57EA210FCF for ; Thu, 6 Mar 2014 18:39:09 +0000 (UTC) Received: (qmail 31548 invoked by uid 500); 6 Mar 2014 18:39:07 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 30998 invoked by uid 500); 6 Mar 2014 18:39:05 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 30984 invoked by uid 99); 6 Mar 2014 18:39:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Mar 2014 18:39:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of brock@cloudera.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qc0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Mar 2014 18:39:00 +0000 Received: by mail-qc0-f175.google.com with SMTP id e16so3352530qcx.20 for ; Thu, 06 Mar 2014 10:38:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=vfjwKvtD7kVB6mm4vqDIvpEThTsKFz2QCYW/zrYxx9Y=; b=mKJpudbZ+qaqYRRyyZZHQtqwRLYj/WNptPQxrxQv3RpSRwAKGbCxOh3ml5rmxSAqf5 0HxOkNjz9Jh7WBlu+iGpXuhrVoOR1Q5Kv5ENbI2edxXtM0l5TaZnW3AqxF5Jx4I7jT/Y sT2bLc/suKNlv8IJeKaWwGaTASVIBIGottwNiqHf2UlvZTgKLM53dBOLk2qnJt4Ma5r1 auRMMzHIDFWB4f1cE2z2bgxGjIVflJxbdqn86hN7OP9xCPL1hVgGtX8nh2wixm1nAKLO kQkCyaYyv3IsukBgS8CxFWOQhnMoupfNJeAdqXOYCzY8s9IkuvaoJeGEuHxkMz9dvk8q zrQg== X-Gm-Message-State: ALoCoQnS84qy5EFtRj9azJVhGneJ7eXCEDGiK0LXS1UF+deHLQ73Q9/4DSoawMvhEgIWoX3rQIPd X-Received: by 10.140.25.142 with SMTP id 14mr14996127qgt.83.1394131119214; Thu, 06 Mar 2014 10:38:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.46.33 with HTTP; Thu, 6 Mar 2014 10:37:59 -0800 (PST) In-Reply-To: References: From: Brock Noland Date: Thu, 6 Mar 2014 12:37:59 -0600 Message-ID: Subject: Re: Thoughts on new metastore APIs To: "dev@hive.apache.org" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Mar 6, 2014 at 12:13 PM, Thejas Nair wrote: > Thanks for discussing this, Brock. I agree that is important to > consider while writing a new metastore api call. > But I think this (single input/output struct) should be a guideline, > I am not sure if this should be used in every case. As with any rule, there are always exceptions. However, looking at the new API's I don't see an instance where it would would have been harmful to use this model. Exceptions should be extremely rare since thousands of RPC implementations successfully use the request/response model. However, as always, it would be up to the developers working on the change to make this call. They'd do so knowing they are going against the guideline and could be asked to justify why they are doing so. > What you are saying shows that there is a tradeoff between ending up > with more functions vs ending up with more input/output > structs/classes. I am not sure if having more input/output structs is > any better. > > Take the case of create_table/create_table_with_environment_context > that you mentioned. Even though create_table had a single input > argument Table, instead of adding EnvironmentContext contents to > Table struct, the authors decided to create a new function with > additional EnvironmentContext argument. This makes sense because the > Table struct is used by other functions as well such as get_table, and > EnvironmentContext fields don't make sense for those cases as it is > not persisted as part of table. > > Which means that the only way to prevent creation of > create_table_with_environment_context method would have been to have a > CreateTableArg struct as input argument instead of Table as the > argument. ie, creating a different struct for the single input/output > every function is only way you can be sure that you don't need more > functions. RPC methods are "special". They are published to the world and therefore cannot be easily modified or refactored. Once we create a new RPC method, we are stuck with it for a very long time. In this way, Thrift is rather strange in that it allows you to exposed the api signatures. The request/response model is far more common. Therefore, if we were creating create_table from scratch, I would suggest we use: CreateTableRequest/CreateTableResponse That way, we can add optional arguments such as "environment context" very easily. Although, I'd love to take credit for this idea, I didn't just come up with this myself. This is a standard way of handling RPC. > This approach of reducing the number of functions also means that you > would start encoding different types of actions within the single > input argument. > > Consider the case of get_partition vs get_partition_by_name. It would > need a single struct with an enum that tells if it to lookup based on > the partition key-values or the name, and based on the enum it would > use different fields in the struct. I feel having different functions > is more readable for this case. There will be cases where we'll need similar methods. This point where is with the request/response model, adding a single parameter doesn't require an entirely new method. The developers working on the change would have to make the call as to whether a new functionality requires a new API or can be handled within the current API. > For example, the api in HIVE-5931 would need to change from > ==================================== > list get_principals_in_role(1:string role_name) > ==================================== > to > > ==================================== > struct GetPrincipalInRoleOutput{ > 1:list rolePrincList; > } > > struct GetPrincipalInRoleInput{ > 1:string role_name; > } > GetPrincipalInRoleOutput get_principals_in_role(1: > GetPrincipalInRoleInput input); > ==================================== > > I am not sure if the insurance costs in terms of readability is low > here. I think we should consider the risk in each case of function > proliferation and pay the costs accordingly. > Let me know if I have misunderstood what you are proposing here. The "Output" and "Input" names do feel odd. As I said above. Request/Response are standard names for these kinds of objects. Today, for a method like the one above, it may feel like overheard or extra work. However, in the future when you want to add another parameter such as "isFilter" or "encoding" etc, then the insurance pays off big time. Brock