Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70308181CA for ; Thu, 30 Jul 2015 17:28:50 +0000 (UTC) Received: (qmail 36887 invoked by uid 500); 30 Jul 2015 17:27:49 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 36815 invoked by uid 500); 30 Jul 2015 17:27:49 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 36805 invoked by uid 99); 30 Jul 2015 17:27:49 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 17:27:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 552C6C4F7A for ; Thu, 30 Jul 2015 17:27:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.791 X-Spam-Level: * X-Spam-Status: No, score=1.791 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-1.108, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 6zdP8_ozoQwz for ; Thu, 30 Jul 2015 17:27:48 +0000 (UTC) Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com [209.85.213.171]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 46E502134F for ; Thu, 30 Jul 2015 17:27:48 +0000 (UTC) Received: by igr7 with SMTP id 7so676385igr.0 for ; Thu, 30 Jul 2015 10:27:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=qrusrf7zuglA41wex1EPxnXCYeTz2TQ6DOnYOoS4Zmk=; b=S8nS4IHGtAnbsdSBZio4q27OrshKQWJnCTO1u9GDCzNkEaRny/wa3Pzed93FLP5hjM 5bBmfWfyiVVKlbSh13Kvfu1kiNOgsuM47vYXE0zqRKADG6Dm0m9P0roQSt6X+k9WFD89 eSTdgAFfpTGZ4nSw/u4JpuvoBR4kU4e6/auJHadB/E0JM47cDwHJ97BnC9cBoVKtKzpF iBj69St0gCAjhU0G3/FE4UI4wqV7hILns7iGCfXcSQCJ1MzDNO/aOWkTzMKrg3VmXg5z TngvsOwP99v7DK5AnwbITswuyQzSmtRmWz1X6/w/MnjkaNcgnuGWuMtcKw7LLyD6KhK/ rGcQ== MIME-Version: 1.0 X-Received: by 10.50.97.33 with SMTP id dx1mr6854104igb.1.1438277267715; Thu, 30 Jul 2015 10:27:47 -0700 (PDT) Received: by 10.107.134.153 with HTTP; Thu, 30 Jul 2015 10:27:47 -0700 (PDT) Date: Thu, 30 Jul 2015 12:27:47 -0500 Message-ID: Subject: Semantic Analysis Run Through From: Raajay To: user@hive.apache.org Content-Type: multipart/alternative; boundary=047d7b10c889a17ed8051c1b056a --047d7b10c889a17ed8051c1b056a Content-Type: text/plain; charset=UTF-8 Hello, I am currently playing around with Hive Semantic Analysis code, to understand how DAGs or Map Reduce plans are generated from Abstract Syntax Trees. The idea is to explore various possible DAGs and compare their performance based on execution run time. The function "analyzeInternal" seems to be handling the entire the plan generation process. The different steps (at a high level) as described in the comment section are: 1. Get Resolved Parse Tree from Syntax Tree 2. Get OP tree (Operator tree?) from Resolved parse tree 3. Deduce Result Set schema 4. Generate Parse Context 5. Do View creation 6. Collect Table Access stats 7. Perform Logical Optimization 8. Get Column Access Stats 9. Optimize Physical OP tree. 10. Translate to target execution engine. I understand that step 7 (Logical Optimization) applies multiple transforms ( e.g. Join Reordering, Constant Propagation, Predicate pushdown) to alter the AST and thus, different DAGs can be obtained by choosing whether to apply or not apply certain transformations. Can changes to the code in Steps 1-2 and 9 also possibly affect the resulting DAGs ? How does the AST get affected in these steps ? Any pointers / explanations will be helpful. Thanks, Raajay --047d7b10c889a17ed8051c1b056a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello,

I am currently playing around wi= th Hive Semantic Analysis code, to understand how DAGs or Map Reduce plans = are generated from Abstract Syntax Trees. The idea is to explore various po= ssible DAGs and compare their performance based on execution run time.

The function "analyzeInternal" seems to be h= andling the entire the plan generation process. The different steps (at a h= igh level) as described in the comment section are:

1. Get Resolved Parse Tree from Syntax Tree

2. G= et OP tree (Operator tree?) from Resolved parse tree

3. Deduce Result Set schema

4. Generate Parse C= ontext

5. Do View creation

6. Collect Table Access stats

7. Perform Logical= Optimization

8. Get Column Access Stats

9. Optimize Physical OP tree.

10. = Translate to target execution engine.


I understand that step 7 (Logical Optimization) applies multiple transfo= rms ( e.g. Join Reordering, Constant Propagation, Predicate pushdown) to al= ter the AST and thus, different DAGs can be obtained by choosing whether to= apply or not apply certain transformations.

Can c= hanges to the code in Steps 1-2 and 9 also possibly affect the resulting DA= Gs ? How does the AST get affected in these steps ? Any pointers / explanat= ions will be helpful.

Thanks,
Raajay
--047d7b10c889a17ed8051c1b056a--