Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5B0468C3 for ; Tue, 14 Jun 2011 15:57:33 +0000 (UTC) Received: (qmail 82262 invoked by uid 500); 14 Jun 2011 15:57:33 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 82233 invoked by uid 500); 14 Jun 2011 15:57:33 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 82225 invoked by uid 99); 14 Jun 2011 15:57:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 15:57:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of qp.wschung@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 15:57:25 +0000 Received: by iwr19 with SMTP id 19so7528171iwr.35 for ; Tue, 14 Jun 2011 08:57:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=zjOvF6wxSWtwAWefdijAIXpbDwrs+1BM0o+WmE5vJC4=; b=awHPQTfvqv4Csfqkz+z+11Mtw4oPAiIF5cqM1J5EDUFh5dS68AmkqOM6XbgpxNY/5V 6XPAflBhe0qAcd1ddb4EeU5sv01RF/qoknqAmsTlt84ui75REwGAsbrSoS21a8MIOKGf 49fPCIU8BMDfKsLlSbLQBSbsGHFGy8TtYNwhc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=iwpj8sI+Ks65rBHVhHdSmfgfSVChMfhUGI972fMbPyfCoenSWm9+bu/FBGVcFLxyt/ Iv6xtrvcluclqocN0gQtHuv2CJn2raNB6jbs4y4PVud6TM2AJTQi7UFg3TEc6mlBBlHB L23rvBeESkpqPIF7G+LZhdR9U4DqebXmokBWQ= MIME-Version: 1.0 Received: by 10.231.203.212 with SMTP id fj20mr6940495ibb.15.1308067024088; Tue, 14 Jun 2011 08:57:04 -0700 (PDT) Received: by 10.231.84.144 with HTTP; Tue, 14 Jun 2011 08:57:04 -0700 (PDT) In-Reply-To: <4DF6995B.5090903@gmail.com> References: <4DF6995B.5090903@gmail.com> Date: Tue, 14 Jun 2011 11:57:04 -0400 Message-ID: Subject: Re: load data unit of work From: W S Chung To: user@hive.apache.org Content-Type: multipart/alternative; boundary=005045016c73502eae04a5ae16f1 X-Virus-Checked: Checked by ClamAV on apache.org --005045016c73502eae04a5ae16f1 Content-Type: text/plain; charset=ISO-8859-1 My question is a "what if" question, not a production issue. It seems natural, when replacing traditional database with hive, to ask how much robustness is sacrificed for scalability. My concern is that if a file is partially loaded, there might not be an easy way to clean up the already loaded data before re-loading the data. The lack of unique index also does not make it easy to avoid duplicate data either, although duplicated data can perhaps be deleted after the load. On Mon, Jun 13, 2011 at 7:12 PM, Martin Konicek wrote: > Hi, > > I think this is a problem with open source in general and sometimes it can > be very frustrating. > However, your question is more of a "what if" question - you're not in the > trouble of finding a horrible bug after you deployed to production, am I > right? > > Regarding your question, I would guess that if LOAD DATA INPATH crashes > while moving files into the Hive warehouse, the data which was moved will > appear as legitimate loaded data. Or the files will be moved but the > metadata will not be updated. In any case, you should detect the crash and > redo the operation. The easiest answer might actually be to look into the > source code - sometimes it can be easier to find than one would expect. > > Not a complete answer, but hope this helps a bit. > > Martin > > > On 14/06/2011 00:47, W S Chung wrote: > >> I submit a question like this before, but somehow that question is never >> delivered. I can even find my question in google. Since I cannot find any >> admin e-mail/feedback form on the hive website that I can ask why the last >> question is not delivered. There is not much option other than to post the >> question again and hope that the question get through this time. Sorry for >> the double posting if you have seen my last e-mail. >> >> What is the behaviour if a client of hive crashes in the middle of >> running a "load data inpath" for either a local file or a file on HDFS? Will >> the file be partially loaded in the db? Thanks. >> >> >> --005045016c73502eae04a5ae16f1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable My question is a "what if" question, not a production issue. It s= eems natural, when replacing traditional database with hive, to ask
how = much robustness is sacrificed for scalability. My concern is that if a file= is partially loaded, there might not be an easy way to clean up the alread= y loaded data before re-loading the data. The lack of unique index also doe= s not make it easy to avoid duplicate data either, although duplicated data= can perhaps be deleted after the load.

On Mon, Jun 13, 2011 at 7:12 PM, Martin Koni= cek <marti= n.konicek@gmail.com> wrote:


On 14/06/2011 00:47, W S Chung wrote:
I submit a question like this before, but somehow that question is never de= livered. I can even find my question in google. Since I cannot find any adm= in e-mail/feedback form on the hive website that I can ask why the last que= stion is not delivered. There is not much option other than to post the que= stion again and hope that the question get through this time. Sorry for the= double posting if you have seen my last e-mail.

What is the behaviour if =A0a client of hive crashes in the middle of runni= ng a "load data inpath" for either a local file or a file on HDFS= ? Will the file be partially loaded in the db? Thanks.



--005045016c73502eae04a5ae16f1--