Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F217116BF for ; Thu, 24 Jul 2014 23:07:03 +0000 (UTC) Received: (qmail 90838 invoked by uid 500); 24 Jul 2014 23:07:01 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 90763 invoked by uid 500); 24 Jul 2014 23:07:01 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 90753 invoked by uid 500); 24 Jul 2014 23:07:01 -0000 Delivered-To: apmail-hadoop-hive-user@hadoop.apache.org Received: (qmail 90750 invoked by uid 99); 24 Jul 2014 23:07:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 23:07:01 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of teddyyyy123@gmail.com designates 209.85.192.181 as permitted sender) Received: from [209.85.192.181] (HELO mail-pd0-f181.google.com) (209.85.192.181) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 23:06:56 +0000 Received: by mail-pd0-f181.google.com with SMTP id g10so4525022pdj.12 for ; Thu, 24 Jul 2014 16:06:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=uShH4EUnzR/Ic2xxpp6jIYZ1E4m2kXkM9ViodwIr7kw=; b=Ny0B31RAZwh/jmslNP8v3E5JrpUgaDgIHhItDMTyXbDbNxP/vG6vaEW1j4bUx/RknK ct/MIL12ruDXzIreaikglmCcOhF0xJdxp4BYVp9qN92F6mKBx60uotwx/Cry0XzCW6VV vC1UVc0Q6EXHz/PNK3SdRIR1jNtevguzqh2OLw0zEWQEPRu3gcvD1y+5sI/X9upkO9Cr jt1GH6GWMRerujVK3Z42SiWAIYUn+wO4V79SAvNA5hTmg40P/lB2LlAcuMkFSAbZelPu +7InU4MMbcjrvHstFPxz1iM1l8snot5+jNmv7aUw06bvijgd2qNwolUpMDCvCp1igZKU JnLA== X-Received: by 10.66.236.6 with SMTP id uq6mr14471581pac.24.1406243195559; Thu, 24 Jul 2014 16:06:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.70.96.237 with HTTP; Thu, 24 Jul 2014 16:06:15 -0700 (PDT) From: Yang Date: Thu, 24 Jul 2014 16:06:15 -0700 Message-ID: Subject: doing upsert possible? To: hive-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1136c83423a52004fef882f6 X-Virus-Checked: Checked by ClamAV on apache.org --001a1136c83423a52004fef882f6 Content-Type: text/plain; charset=UTF-8 if we have a huge table, and every 1 hour only 1% of that has some updates, it would be a huge waste to slurp in the whole table through MR job and write out the new table. instead, if we store this table in HBASE, and use the current HBase+Hive integration, as long as we can do upsert, then we can afford to touch only that 1% of entries, and the result can be very fast. --001a1136c83423a52004fef882f6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
if we have a huge table, and every 1 hour only 1% of that = has some updates, it would be a huge waste to slurp in the whole table thro= ugh MR job and write out the new table.=C2=A0

instead, i= f we store this table in HBASE, and use the current HBase+Hive integration,= as long as we can do upsert, then we can afford to touch only that 1% of e= ntries, and the result can be very fast.
--001a1136c83423a52004fef882f6--