Return-Path: Delivered-To: apmail-hive-user-archive@www.apache.org Received: (qmail 28179 invoked from network); 29 Mar 2011 14:37:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Mar 2011 14:37:15 -0000 Received: (qmail 53348 invoked by uid 500); 29 Mar 2011 14:37:14 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 53317 invoked by uid 500); 29 Mar 2011 14:37:14 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 53309 invoked by uid 99); 29 Mar 2011 14:37:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 14:37:14 +0000 X-ASF-Spam-Status: No, hits=3.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,SUBJ_ALL_CAPS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of luke.forehand@networkedinsights.com designates 66.46.182.53 as permitted sender) Received: from [66.46.182.53] (HELO relay.ihostexchange.net) (66.46.182.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 14:37:09 +0000 Received: from VMBX114.ihostexchange.net ([192.168.30.14]) by HUB103.ihostexchange.net ([66.46.182.53]) with mapi; Tue, 29 Mar 2011 10:36:48 -0400 From: Luke Forehand To: "user@hive.apache.org" Date: Tue, 29 Mar 2011 10:36:46 -0400 Subject: LOAD DATA INPATH + TABLESAMPLE Thread-Topic: LOAD DATA INPATH + TABLESAMPLE Thread-Index: AcvuHrvXUBFBURDORQaAet+1uXRzSA== Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.0.0.100825 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C9B758AE6C48lukeforehandnetworkedinsightscom_" MIME-Version: 1.0 --_000_C9B758AE6C48lukeforehandnetworkedinsightscom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello! Our hive table import process uses a dynamic partition insert into a tempor= ary table, then the resulting sequence files are loaded into the master tab= le using LOAD DATA INPATH because we want the data online immediately for q= uerying. The data that is loaded does not overwrite files already existing= in the partitions so we are essentially doing an "append" to the partition= s. Our question is, is this a bad practice, and how does this affect table= sampling? It seems that the table sample mechanism expects as many files = in the partition folder as are partition buckets. Doing a "compaction" of = the table using INSERT OVERWRITE to re-write the partitions fixes the table= sampling problem, but we would like to avoid the expensive write. Are the= re better ways to accomplish our goal of putting data online quickly, and p= reserve the ability to table sample? Thanks, Luke Forehand http://www.networkedinsights.com --_000_C9B758AE6C48lukeforehandnetworkedinsightscom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hello!

Our hive table import process uses a dynamic partiti= on insert into a temporary table, then the resulting sequence files are loa= ded into the master table using LOAD DATA INPATH because we want the data o= nline immediately for querying.  The data that is loaded does not over= write files already existing in the partitions so we are essentially doing = an "append" to the partitions.  Our question is, is this a bad practic= e, and how does this affect table sampling?  It seems that the table s= ample mechanism expects as many files in the partition folder as are partit= ion buckets.  Doing a "compaction" of the table using INSERT OVERWRITE= to re-write the partitions fixes the table sampling problem, but we would = like to avoid the expensive write.  Are there better ways to accomplis= h our goal of putting data online quickly, and preserve the ability to tabl= e sample?

Thanks,
Luke Forehand
http://www.networkedinsights.com

--_000_C9B758AE6C48lukeforehandnetworkedinsightscom_--