Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6E2B0165C8D for ; Tue, 22 Aug 2017 03:22:46 +0200 (CEST) Received: (qmail 34139 invoked by uid 500); 22 Aug 2017 01:22:40 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 34129 invoked by uid 99); 22 Aug 2017 01:22:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Aug 2017 01:22:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1EDA9C006E for ; Tue, 22 Aug 2017 01:22:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.658 X-Spam-Level: ** X-Spam-Status: No, score=2.658 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, PLING_QUERY=0.279, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id JnHGi7itRj6l for ; Tue, 22 Aug 2017 01:22:38 +0000 (UTC) Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D99095F5B5 for ; Tue, 22 Aug 2017 01:22:37 +0000 (UTC) Received: by mail-lf0-f45.google.com with SMTP id d17so73241296lfe.0 for ; Mon, 21 Aug 2017 18:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=zqL3z9JIKzLKNsQFkEPxfyuJJcHApIrmvDamwyk4CsE=; b=MpGsD8hHH6SLsjoVYsyuGeRBW8oHUS+FIDAyJSNf5k/2i5J4/NAWGkdw9C9zIF8s9+ 4wWOywkvm4/x5ePHOFhGRUc1wW0un/vbk67bmfgNraov4jFoHi9VLaaZfXHnEk8wato2 J6TWD9XIymSEwYEr2axxkEqDRS4NArvew9Q6FV0OOWPzg/hhTw1F+64Hp4V6/7OO8kiL cqv9l0aF9qGZyyjB+kDhMJO/Nnt3r73G3yPxaoeJVgqaDBgaZ9Zr6Tl5IPil4bQN8QVd T/qoqMlI04gUMg3hU/WQhZP3CQr2F6vhOL0isBw/b+dopsYwcSjbLgbfaLxDeAKObliI cVmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=zqL3z9JIKzLKNsQFkEPxfyuJJcHApIrmvDamwyk4CsE=; b=SndjkzXDgPZRAvluqhoipoTxYSBcDTvOaTNOSZGNSGODBMfHumD1cn/1jW1EFVWhfE xdb/0xZvmYqIp3u+ni+LKvMnM6sDmI3X/vr95/VPLi/eAvR1nUA7RvSAgyCf6aCUzzA7 IlRD40/8EXqSRYtLpVFsQT/fqn20FclXjWj/LFW6zhtpRJBhHlucShsxdJXyHcYNRN3z ZYJObXvP+urKNFq1TSBaN5MYPLlVbCbsZqMS/eUX31ZKAMhKshhuP32W9B9zBXeNO0a0 eux6lR/8A+lVUc7ntGFE5RerM+y82g4p17qln8ZUAGh/xAbRKzzIbNZNMPKNWWPwgv0x c7NA== X-Gm-Message-State: AHYfb5iypvAPfBB0+dQ4tUrIyJmD0vfj/jC+7qI/x0Xp9ruX2vBmjyzw tAsTiKo9hho0Z9K1q700RIjpsPSrCaPK X-Received: by 10.25.168.3 with SMTP id r3mr6995202lfe.111.1503364956522; Mon, 21 Aug 2017 18:22:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.179.65.193 with HTTP; Mon, 21 Aug 2017 18:22:16 -0700 (PDT) From: Thai Bui Date: Mon, 21 Aug 2017 20:22:16 -0500 Message-ID: Subject: Hive index + Tez engine = no performance gain?! To: user@hive.apache.org Content-Type: multipart/alternative; boundary="001a11411d5a3390b205574d6ead" --001a11411d5a3390b205574d6ead Content-Type: text/plain; charset="UTF-8" This seems out of the blue but my initial benchmarks have shown that there's no performance gain when Hive index is used with Tez engine. I'm not sure why, but several posts online have suggested that Tez engine does not support Hive index (bitmap, compact). Is true? If yes, that is sad. I understand that ORC format is a much better alternative if you manage your own tables. However, at my company, we have several teams that pick our own technology and thus, most teams would use Parquet due to its ease of integrations with various external systems. Nonetheless, we still want to have fast ad-hoc query via Hive LLAP / Tez. I think that index is a perfect solution for non-ORC file format since you can selectively build an index table and leverage Tez to only look at those blocks and/or files that we need to scan. Thanks for any input, Thai --001a11411d5a3390b205574d6ead Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
This seems out of the blue but my initial benchmarks have shown that there's no pe= rformance gain when Hive index is used with Tez engine. I'm not sure wh= y, but several posts online have suggested that Tez engine does not support= Hive index (bitmap, compact). Is true? If yes, that is sad.

I understand that ORC format is a much better altern= ative if you manage your own tables. However, at my company, we have severa= l teams that pick our own technology and thus, most teams would use Parquet= due to its ease of integrations with various external systems.=C2=A0
=

Nonetheless, we still want to have fast ad-hoc q= uery via Hive LLAP / Tez. I think that index is a perfect solution for non-= ORC file format since you can selectively build an index table and leverage= Tez to only look at those blocks and/or files that we need to scan.

Thanks for any input,
Thai
--001a11411d5a3390b205574d6ead--