Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E2BD2174EF for ; Wed, 22 Apr 2015 20:07:51 +0000 (UTC) Received: (qmail 10090 invoked by uid 500); 22 Apr 2015 20:07:50 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 10025 invoked by uid 500); 22 Apr 2015 20:07:50 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 10012 invoked by uid 99); 22 Apr 2015 20:07:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2015 20:07:50 +0000 X-ASF-Spam-Status: No, hits=4.2 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of alanfgates@gmail.com does not designate 54.191.145.13 as permitted sender) Received: from [54.191.145.13] (HELO mx1-us-west.apache.org) (54.191.145.13) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2015 20:07:42 +0000 Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id B574D25F61 for ; Wed, 22 Apr 2015 20:07:22 +0000 (UTC) Received: by pdbnk13 with SMTP id nk13so283375465pdb.0 for ; Wed, 22 Apr 2015 13:06:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=2ZjSZDCQONhghoWb1xOoJuNsmbUeGkIWC/6jrOhVdDQ=; b=EJoumgdpIEaENmS2hr11BQ1TKSJCdMZnQ5fWVfLIvrAoBtDRJLrhUCKQXaOV7x9r/L Git/k/OY+dNOx6iCn25Hg6T9PsERyGFqtUwkJd3PQEW0dzSsGXsTxeZHu14ugBQ0mwG9 PyRbTr0uGbL2kK142X8EhT+dFFSaKdCkSzpEQiv4BkjgbLfz8pS9an14NB71WpTo/UIA 1llgilmWz+6WSxg8t6j9iYIJt712+dNqqKtwc0MbLCsToy24HV21mh3Jnz6T9/aC2aA9 NWqoTkvMmVqJFROWf+IPx29lA2OOXaRcHM4geR0jUgsw2cnJaoXLlVGFq2mfWyWjl9C2 Sqag== X-Received: by 10.66.121.129 with SMTP id lk1mr51046767pab.155.1429733197311; Wed, 22 Apr 2015 13:06:37 -0700 (PDT) Received: from Alan-Gatess-MacBook-Pro.local (c-76-103-170-145.hsd1.ca.comcast.net. [76.103.170.145]) by mx.google.com with ESMTPSA id ia3sm5847071pbc.31.2015.04.22.13.06.35 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 22 Apr 2015 13:06:36 -0700 (PDT) Message-ID: <5537FF49.3070704@gmail.com> Date: Wed, 22 Apr 2015 13:06:33 -0700 From: Alan Gates User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: user@hive.apache.org Subject: Re: Transactional table read lifecycle References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------070904020202040601050208" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------070904020202040601050208 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Whether you obtain a read lock depends on the guarantees you want to make to your readers. Obtaining the lock will do a couple of things your uses might want: 1) It will prevent DDL statements such as DROP TABLE from removing the data while they are reading it. 2) It will prevent the compactor from removing the versions of the delta files they are reading. The other step you'll want is to heartbeat the lock. To avoid dead clients holding locks forever the DbLockManager times them out after 300 seconds (default, it's configurable). To avoid this you'll need to call IMetaStoreClient.heartbeat on a regular basis. Alan. > Elliot West > April 17, 2015 at 8:05 > Hi, I'm working on a Cascading Tap that reads the data that backs a > transactional Hive table. I've successfully utilised the in-built > OrcInputFormat functionality to read and merge the deltas with the > base and optionally pull in the RecordIdentifiers. However, I'm now > considering what other steps I may need to take to collaborate with an > active Hive instance that could be writing to or compacting the table > as I'm trying to read it. > > I recently became aware of the need to obtain a list of valid > transaction IDs but now wonder if I must also acquire a read lock for > the table? I'm thinking that the set of interactions for reading this > data may look something like: > > 1. Obtain ValidTxnList from the meta store: > org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns() > > 2. Set the ValidTxnList in the Configuration: > conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString()); > > 3. Aquire a read lock: > org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest) > > 4. Use OrcInputFormat to read the data > > 5. Finally, release the lock: > org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long) > > Can you advise on whether the lock is needed, whether this is the > correct way of managing the lock, and whether there are any other > steps I need take to appropriately interact with the data underpinning > a 'live' transactional table? > > Thanks - Elliot. > --------------070904020202040601050208 Content-Type: multipart/related; boundary="------------020106000305010001050102" --------------020106000305010001050102 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit Whether you obtain a read lock depends on the guarantees you want to make to your readers.  Obtaining the lock will do a couple of things your uses might want:
1) It will prevent DDL statements such as DROP TABLE from removing the data while they are reading it.
2) It will prevent the compactor from removing the versions of the delta files they are reading.

The other step you'll want is to heartbeat the lock.  To avoid dead clients holding locks forever the DbLockManager times them out after 300 seconds (default, it's configurable).  To avoid this you'll need to call IMetaStoreClient.heartbeat on a regular basis.

Alan.

April 17, 2015 at 8:05
Hi, I'm working on a Cascading Tap that reads the data that backs a transactional Hive table. I've successfully utilised the in-built OrcInputFormat functionality to read and merge the deltas with the base and optionally pull in the RecordIdentifiers. However, I'm now considering what other steps I may need to take to collaborate with an active Hive instance that could be writing to or compacting the table as I'm trying to read it.

I recently became aware of the need to obtain a list of valid transaction IDs but now wonder if I must also acquire a read lock for the table? I'm thinking that the set of interactions for reading this data may look something like:

  1. Obtain ValidTxnList from the meta store:
    org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns()

  2. Set the ValidTxnList in the Configuration:
    conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString());

  3. Aquire a read lock:
    org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)

  4. Use OrcInputFormat to read the data

  5. Finally, release the lock:
    org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)

Can you advise on whether the lock is needed, whether this is the correct way of managing the lock, and whether there are any other steps I need take to appropriately interact with the data underpinning a 'live' transactional table?

Thanks - Elliot.

--------------020106000305010001050102 Content-Type: image/jpeg; x-apple-mail-type=stationery; name="postbox-contact.jpg" Content-Transfer-Encoding: base64 Content-ID: Content-Disposition: inline; filename="postbox-contact.jpg" /9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgICAgMCAgIDAwMDBAYEBAQEBAgGBgUGCQgK CgkICQkKDA8MCgsOCwkJDRENDg8QEBEQCgwSExIQEw8QEBD/2wBDAQMDAwQDBAgEBAgQCwkL EBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBD/wAAR CAAZABkDAREAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAA AgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkK FhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWG h4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl 5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREA AgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYk NOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOE hYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk 5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD1n4x/tKeIfC0FlZ6N4GvNQub42kdxqUtr AIkkmiDmV3lUkoCeTjA6DpQBofDfx14h8XfCTU/iz/wj1kF0G0n1C/gl02ESNFCCXKBVw3yg kY6igD4m+L3xu074seP/AAz418P6Xp1p9q8q3mMdpbrLhJgY5VkVRJGxV+MEMNoI9aAON/4b F/aZ/wCirar/AN8Q/wDxFAH6neLfDvgfXfBOianr1lHNP/ZdowQRhmkbyUwuO+TigDLb4x6X 8HPDM/hvV/BEUsRsJvtYFzGschYFVtlgC5YFGGSSByaAPzG+L3h/RvCPxPsIPDtv5NnLHbXS wM2TApkISPPcKiqB7CgDxD7SfQUAftTHp58U+BtCsYtSnsri3s7VHkiVTJDJHEFZSrAjcrAg gjgjkcUAeI/G3wz42tpEhl8Z31yXVsC8srfy8Y4yEVeo4zQB8NfETX7XXPHl5qTN5kNo8drE +75SkIC5+hIJ/GgDkP8AhWPxE/6Jp4m/8FNx/wDEUAfe/wC0H/yXDxd/1+r/AOikoA8X+JX+ oh/3DQB81fDX/kqvhX/sYbH/ANKUoA/c+gD/2Q== --------------020106000305010001050102-- --------------070904020202040601050208--