Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4333B200D5E for ; Sat, 9 Dec 2017 04:09:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 41E37160C1F; Sat, 9 Dec 2017 03:09:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 612D1160BFD for ; Sat, 9 Dec 2017 04:09:03 +0100 (CET) Received: (qmail 89238 invoked by uid 500); 9 Dec 2017 03:09:02 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 89228 invoked by uid 99); 9 Dec 2017 03:09:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Dec 2017 03:09:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7EBB818031D for ; Sat, 9 Dec 2017 03:09:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.789 X-Spam-Level: ** X-Spam-Status: No, score=2.789 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_IMAGE_ONLY_20=0.7, HTML_MESSAGE=2, KAM_TRACKIMAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id PTwCkNcIPKhh for ; Sat, 9 Dec 2017 03:09:00 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B491C5F39E for ; Sat, 9 Dec 2017 03:08:59 +0000 (UTC) Received: by mail-wm0-f51.google.com with SMTP id b199so2205591wme.1 for ; Fri, 08 Dec 2017 19:08:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=yTX32z9s+lQMgo4SK1XQ5zRQP5a/bK0a9b1wRSojDhA=; b=Mxat5SpCb5MvOLZYZhIKAthT0T07z/AQpklMNe3Eak8x4aQoas5zejJWrouspFSkWt /jL9u5wAD2As5S7/GdXd9EmmP7bW48J+nftnvcmv4iAbtVugAp2Y9o3ZbgKlhDK6+xmg Mt34m+snpnvYk1T7UQuzVKf91YYiWtIhLV+Qy9cWprByHj3emFp/tMvKz+sw1CPDA3yO 3DcLNdsliTywXVrfFCvyicG3311QZ4zouhlBpcTUBEGqTcazxWlVxj4Yc4IaLcXLUoSu XLsXYmW9YBmspaWpuDIRs8VRlaCX31OK0yOQQtkCmhBeI+IyUOzqyb/e2Gx3/+Jh4B/c WuTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=yTX32z9s+lQMgo4SK1XQ5zRQP5a/bK0a9b1wRSojDhA=; b=ZuZzRqbyYBkzv9HtBI1dUyAKXEH/rCyVCV7Blw/fEY1Blmyu7uv2rWABuNCv/8Xj2a VPmmXcoirkp7XEfPfLfX9ToTXTK7xqoIDgONQEsVfxEJi/dC32eA0DNfH/DeZQCI6pdM y0iXsMcELFijOQ4TC8B44gFR/6ixDk6EoaN8SE7h81nngSfUmgRdB0Mto6llWtT5vTtC 2ms1Lx0Xgj5OjejugJiPGb6d+tcip/NeWNtiiPcVwrlqt5jHR+6L4+3jPH2fDmY/z0N/ cNbcziXfyMsxb/aKbE05tDwos5i1dGigOVzJRdu6IYXhagFcIhzuvfkQH6VVniuE/0aI VSKQ== X-Gm-Message-State: AJaThX4mYC92Oj/sZB5hs3pi0BkgZ3aYXees5a7iU+9uMpNzJc6nb56e 1T5+RZkeDrKxxDVoMmtjhEF/oTTRStaGJtU5TIs= X-Google-Smtp-Source: AGs4zMZNGe170hpt487iN2WWHP4H9lgRzg0mmI0ofYXTBLIZ+uqSCE9TKJ7yRaIB5f9FXuqKQnF0AUhFtu1i0JuSWTk= X-Received: by 10.80.216.74 with SMTP id v10mr51828998edj.258.1512788938551; Fri, 08 Dec 2017 19:08:58 -0800 (PST) MIME-Version: 1.0 Received: by 10.80.214.25 with HTTP; Fri, 8 Dec 2017 19:08:58 -0800 (PST) Received: by 10.80.214.25 with HTTP; Fri, 8 Dec 2017 19:08:58 -0800 (PST) In-Reply-To: <9b3ad081-557c-8d6c-22fd-ee0ee16a4011@mixmax.com> References: <9b3ad081-557c-8d6c-22fd-ee0ee16a4011@mixmax.com> From: David Capwell Date: Fri, 8 Dec 2017 19:08:58 -0800 Message-ID: Subject: Re: Roaring Bitmap UDFs To: user@hive.apache.org Content-Type: multipart/alternative; boundary="089e082212c84d7470055fdf9f02" archived-at: Sat, 09 Dec 2017 03:09:04 -0000 --089e082212c84d7470055fdf9f02 Content-Type: text/plain; charset="UTF-8" Think bloom filter that's more dynamic. It works well when cardinality is low, but grows quickly to out cost bloom filter as cardinality grows. This data structure supports existence queries, but your email sounds like you want count. If so not really the best fit. On Dec 8, 2017 5:00 PM, "Nitin Vijayvargiya" wrote: Hi all, I'm working on speeding up distinct count calculations, and it looks like roaring bitmaps (RB) is the newest and meanest way for set operations. Anyone here have experience with them? How was the performance compared to hyperloglog and EWAH? A quick google search showed me that it's easier to find UDF implementations of hyperloglog in presto and hive, but if the hype is real, it might be worth spending the time to incorporate RB. Also, if anyone can point me to reliable implementations of UDFs using RB, I would love to check it out and test it myself =) Happy Holidays! Nitin --089e082212c84d7470055fdf9f02 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Think bloom filter that's more dynamic.=C2=A0 It= works well when cardinality is low, but grows quickly to out cost bloom fi= lter as cardinality grows.

This data structure supports existence queries, but your email sounds li= ke you want count.=C2=A0 If so not really the best fit.

On Dec 8, 2017 5:00 = PM, "Nitin Vijayvargiya" <nitinvijay94@gmail.com> wrote:
Hi all,

I'm working on speedin= g up distinct count calculations, and it looks like roaring bitmaps (RB) is= the newest and meanest way for set operations. Anyone here have experience= with them? How was the performance compared to hyperloglog and EWAH? A qui= ck google search showed me that it's easier to find UDF implementations= of hyperloglog in presto and hive, but if the hype is real, it might be wo= rth spending the time to incorporate RB. Also, if anyone can point me to re= liable implementations of UDFs using RB, I would love to check it out and t= est it myself =3D)

Happy Holidays!

<= /div>
Nitin
3D""

--089e082212c84d7470055fdf9f02--