Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DFD52200B57 for ; Sat, 23 Jul 2016 07:10:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C7055160A8E; Sat, 23 Jul 2016 05:10:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EC2E5160A6D for ; Sat, 23 Jul 2016 07:09:59 +0200 (CEST) Received: (qmail 19256 invoked by uid 500); 23 Jul 2016 05:09:57 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 19244 invoked by uid 99); 23 Jul 2016 05:09:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Jul 2016 05:09:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 90522C0B7B for ; Sat, 23 Jul 2016 05:09:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=flipkart.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id K_LiUFTfwQIB for ; Sat, 23 Jul 2016 05:09:54 +0000 (UTC) Received: from mail-qk0-f180.google.com (mail-qk0-f180.google.com [209.85.220.180]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 950335FADC for ; Sat, 23 Jul 2016 05:09:53 +0000 (UTC) Received: by mail-qk0-f180.google.com with SMTP id x1so118117522qkb.3 for ; Fri, 22 Jul 2016 22:09:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flipkart.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=bxRdm7YQx1HGIziiOonWVt822ji/Y9WmgE8wECoBaFg=; b=CqFmGYFCRC0olY/Tay2KlOa0TOorIKwf5x9z32yXzCNqfbxzWWUv4Fq3bmkLqXVafb Q6FasD4dMbgMkVylpETMmgp2amQGCD8P2o3Sce4jZrWTveznmytiuHpaDAru6g8c9+dj Ot9zvdOXa0f151Wkx3lKgW1X/7bv1AgwXAg7gZIGJvLzygv//c8DaPxQYtJTbzv0EALC KhTuxa0BARlu1Uzn2hwUGr5TIG5wA7ML3H4ec9njTVDMNpamZ7z/4JeqABPkNRHOHJOY /KJVx6wJtczVuGz/NYm0qDb+q2UeOJVkTVZyx7mP4hk2gxQ3oeQQ4vho48JbrwvH6r74 zUug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=bxRdm7YQx1HGIziiOonWVt822ji/Y9WmgE8wECoBaFg=; b=CYsydyTGKbyeKTVKvyRJ5hiem4GYKVu9RyZ3Z+lFHzc19sy5jpCzmi7puD0Rjvgeai oQHyLQCKByJvZ/vXtE6VtEysLB1e3KAaFM+HLh7aFLBiM03CD6dTa48dpUlN/wSSyClC 5NbzFw/tOuvMEjedtjhgi1/OsfjTBCch/A/hHVLs/cZUDvAsPibCTXZuwTcQRpPm8qn3 TTsRlUAYoQDhX4c2XKEgpL0ZVk6985Mpqd9khtG7fvdBNNOzkKIW5FbjoC+om30JaJVV C35YB3y7g5/djrJAA2MF2Qz3uYAwRyO5OF1omlZerV4Saeqv2/MaqEQXqc/EvEuSjRhP FEUA== X-Gm-Message-State: AEkooutyT3T8K/cYSdR6J15nhNciLGYDwjCgplckhOtTz09k8XEFx7MO/PZXO5FNd9nlOz6y1BDmRuSR/5nJhc+v X-Received: by 10.233.232.23 with SMTP id a23mr9082417qkg.25.1469250587033; Fri, 22 Jul 2016 22:09:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.89.105 with HTTP; Fri, 22 Jul 2016 22:09:27 -0700 (PDT) In-Reply-To: References: From: Umesh Prasad Date: Sat, 23 Jul 2016 10:39:27 +0530 Message-ID: Subject: Re: Question : Is there a Automated Hive Database/Schema Designer To: user@hive.apache.org Content-Type: multipart/alternative; boundary=94eb2c03445253c5320538468fef archived-at: Sat, 23 Jul 2016 05:10:01 -0000 --94eb2c03445253c5320538468fef Content-Type: text/plain; charset=UTF-8 Reposting .. Thanks & Regards Umesh Prasad On Thu, Jul 21, 2016 at 8:04 AM, Umesh Prasad wrote: > Hi All, > Does hive a Automated Database Desginer or has anyone tried building it > ? Something which is equivalent to Vertica's DDB and Microsoft SQL > server's Automated Partitioning Design in Parallel Databases. > > References are : > 1. Automated Partitioning Design in Parallel Database Systems ( > https://cs.brown.edu/courses/cs227/archives/2012/papers/partitioning/p1137-nehme.pdf > ) > > 2. DBDesigner: A Customizable Physical Design Tool for Vertica Analytic > Database > (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6816725) > > Hive tuning tips mention need for pre-sorting tables on filter columns(for > better predicate push down and joins), partitioning/clustering on > join/group by columns, having a higher replication factor for dimension > tables etc. However, I couldn't find any tool/library which suggests a > physical layout given set of hive queries. > > Manually designing the physical layout doesn't scale specially the > producers and consumers of tables (Data) are multiple different teams. > There are conflicting requirements for optimizing different queries and > globally optimal design can be very different from locally optimal design. > > If someone in community has worked on this or can give pointers, then it > would be extremely useful for us. > > > Thanks & Regards > Umesh Prasad > > Team Lead, Flipkart > > > --94eb2c03445253c5320538468fef Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Reposting .. =C2=A0

Thanks & Regards
Umesh Prasad


On Thu, Jul 21, 2016 at 8:04 AM, Umesh Prasa= d <umesh.prasad@flipkart.com> wrote:
Hi All,
=C2=A0 =C2=A0Does hive a Au= tomated Database Desginer or has anyone tried building it ?=C2=A0 Something= which is equivalent to Vertica's DDB and Microsoft SQL server's Au= tomated Partitioning Design in Parallel Databases.
=C2=A0=C2=A0
References are :=C2=A0
1.=C2=A0Automated Partitioning De= sign in Parallel Database Systems (= https://cs.brown.edu/courses/cs227/archives/2012/papers/partitioning/p1137-= nehme.pdf)

2.=C2=A0DBDesigner: A Customizable = Physical Design Tool for Vertica Analytic Database

Hive tuning tips mention need for p= re-sorting tables on filter columns(for better predicate push down and join= s), partitioning/clustering on join/group by columns, having a higher repli= cation factor for dimension tables etc. However, I couldn't find any to= ol/library which suggests a physical layout given set of hive queries.=C2= =A0

Manually designing the physical layout doesn&#= 39;t scale specially the producers and consumers of tables (Data) are multi= ple different teams. There are conflicting requirements for optimizing diff= erent queries and globally optimal design can be very different from locall= y optimal design.

If someone in community has work= ed on this or can give pointers, then it would be extremely useful for us.<= /div>


Thanks & Regards
Umesh Prasad

Team Lead, Flipkart=C2=A0



--94eb2c03445253c5320538468fef--