Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7F1F2004F1 for ; Wed, 30 Aug 2017 16:57:39 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E638B1692AA; Wed, 30 Aug 2017 14:57:39 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 38F011692A9 for ; Wed, 30 Aug 2017 16:57:39 +0200 (CEST) Received: (qmail 91687 invoked by uid 500); 30 Aug 2017 14:57:38 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 91676 invoked by uid 99); 30 Aug 2017 14:57:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Aug 2017 14:57:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A7935C2697 for ; Wed, 30 Aug 2017 14:57:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.629 X-Spam-Level: ** X-Spam-Status: No, score=2.629 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id QPnjP16l-yry for ; Wed, 30 Aug 2017 14:57:36 +0000 (UTC) Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com [209.85.215.52]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 2F62E5FB51 for ; Wed, 30 Aug 2017 14:57:36 +0000 (UTC) Received: by mail-lf0-f52.google.com with SMTP id z12so25537062lfd.3 for ; Wed, 30 Aug 2017 07:57:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=h3Apx/WJJKtoSltpfTrAlJv0k+K0a/5aCd9QH8vECYg=; b=URYG8ZePJhYB8DsXPQoi3QziMPdOPWgI5QUI6hMTwr9RNVDOjB1D13UQbV4yrWpXIB PqM+HF51uiytFJV3C6asjXk99WU83HaBq9B3NTXqXd0jDrHnEmOj26Js035J5mw+Ybmb zneg5Ge9/zWVNqG/rz3nK/rv30XWKkki9UZZB1jWe17/hO04BQWQIWEK5rwUvwe4tzE4 wfpM3tyU9HBhxSagSOKqgxrJ8JuN5WdysH7CZRRqk9XpilQ9LwKBsR6WMq/N8xcGynX9 +uR4ABMVIvHVFo/JHH6v1zky9seMera/eaJr50uZasoV/cbDN7DJgiNsfKVKJBoC9bXm tFJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=h3Apx/WJJKtoSltpfTrAlJv0k+K0a/5aCd9QH8vECYg=; b=hdJ+kQJsEohP9d7gI3wgnNpuzaf969T8FVPnf/ybKKTYtTvF4eSfB8GDrM5xwguPnz 8adEydwuU7hwRqg24sKoNd52uKQq8oTX/4OxMJGThR8CYJ5BjYDe2XMWJiKl0vu+1ZrM lXDeTRN/a4WjUy+stYaSYwJesBYr4cTRrXgjkWtIrgAeqjREKRt2HbaClCjX7egXV2n/ sg5r7ohay4gmaqokUt7pRIr4fChG0gMw3oMiZK4izELmUR9IVBwgFSjTtF1/kZHauZhu j+1gnsWDTRSGFts+r1yVt2heIWjLtFUPMNKkOZE3os35U13bEipIyD9shwVwaQaGBVZi vmyg== X-Gm-Message-State: AHYfb5j4FZOXURuKx6BL5OxJiztNsA+OIwDuNBhBB7MLpOGYpjC4gY1u yaFxLCig9Cxv8OywR5PbG7WJDC36uwS8 X-Received: by 10.46.8.9 with SMTP id 9mr834964lji.177.1504105055317; Wed, 30 Aug 2017 07:57:35 -0700 (PDT) MIME-Version: 1.0 From: Benjamin Kim Date: Wed, 30 Aug 2017 14:57:24 +0000 Message-ID: Subject: DMP/CDP Profile Store To: "user@kudu.apache.org" Content-Type: multipart/alternative; boundary="f403045ec29286f5360557f9bfb3" archived-at: Wed, 30 Aug 2017 14:57:40 -0000 --f403045ec29286f5360557f9bfb3 Content-Type: text/plain; charset="UTF-8" I was wondering has anyone worked on a DMP/CDP for storing user and customer profiles in Kudu. Each user will have their base ID's aka identity graph along with statistics based on their attributes along with tables for these attributes grouped by category. Please let me know what you think of my thoughts. I was thinking of creating a base profile table to store the ID's and statistics along with unchanging or rarely changing attributes, such as name, that do not need to be tracked. Next, I would create tables to categorize groups of attributes, such as user information, behaviors, geolocation, devices, etc. These attribute tables would have columns for each attribute and would track changes by only inserting data via a time stamp column to know when it was entered. Essentially, I would follow the type 2 slowly changing dimension operandi for data warehouses. For attributes that expire, we will partition by a time range so that we can drop off expired data. For attributes where we only need to latest one, we would add an active column to easily flag and query them after inactivating older versions. Any comments or advice would be truly appreciated. Cheers, Ben --f403045ec29286f5360557f9bfb3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I was wondering has anyone worked on a DMP/CDP for storin= g user and customer profiles in Kudu. Each user will have their base ID'= ;s aka identity graph along with statistics based on their attributes along= with tables for these attributes grouped by category.

Please let me know what you think of my thou= ghts.

I was thinking of = creating a base profile table to store the ID's and statistics along wi= th unchanging or rarely changing attributes, such as name, that do not need= to be tracked. Next, I would create tables to categorize groups of attribu= tes, such as user information, behaviors, geolocation, devices, etc. These = attribute tables would have columns for each attribute and would track chan= ges by only inserting data via a time stamp column to know when it was ente= red. Essentially, I would follow the type 2 slowly changing dimension opera= ndi for data warehouses. For attributes that expire, we will partition by a= time range so that we can drop off expired data. For attributes where we o= nly need to latest one, we would add an active column to easily flag and qu= ery them after inactivating older versions.

Any comments or advice would be truly appreciated.

Cheers,
Ben
--f403045ec29286f5360557f9bfb3--