Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 51698 invoked from network); 26 May 2010 17:18:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 May 2010 17:18:16 -0000 Received: (qmail 51389 invoked by uid 500); 26 May 2010 17:18:15 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 51357 invoked by uid 500); 26 May 2010 17:18:15 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 51349 invoked by uid 99); 26 May 2010 17:18:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 May 2010 17:18:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of athusoo@facebook.com designates 69.63.179.25 as permitted sender) Received: from [69.63.179.25] (HELO mailout-sf2p.facebook.com) (69.63.179.25) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 May 2010 17:18:08 +0000 Received: from mail.thefacebook.com ([192.168.18.105]) by pp02.snc1.tfbnw.net (8.14.3/8.14.3) with ESMTP id o4QHGvCX012388 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Wed, 26 May 2010 10:17:08 -0700 Received: from SC-MBXC1.TheFacebook.com ([192.168.18.102]) by sc-hub02.TheFacebook.com ([192.168.18.105]) with mapi; Wed, 26 May 2010 10:17:07 -0700 From: Ashish Thusoo To: "hive-user@hadoop.apache.org" Date: Wed, 26 May 2010 10:17:07 -0700 Subject: RE: Garbage data in metadata store? Thread-Topic: Garbage data in metadata store? Thread-Index: Acr8zqORxv8I247fS8eYVR/GtLwfQwAKIPcw Message-ID: <68B7689C98024D43B4C2709456F0B5200A612F0463@SC-MBXC1.TheFacebook.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_68B7689C98024D43B4C2709456F0B5200A612F0463SCMBXC1TheFac_" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2010-05-26_04:2010-02-06,2010-05-26,2010-05-26 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_68B7689C98024D43B4C2709456F0B5200A612F0463SCMBXC1TheFac_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Do you have partitions in the table? Storage descriptors can also be associ= ated with partitions. Ashish ________________________________ From: Ted Xu [mailto:ted.xu.ml@gmail.com] Sent: Wednesday, May 26, 2010 5:26 AM To: hive-user@hadoop.apache.org Subject: Garbage data in metadata store? Hi all, I want to replicate hive metadata to another place, while I found my hive m= etadata contains a big portion of data looks like garbage. In my understanding, the hive metadata store use 'Storage Descriptor' to ke= ep relationship between tables and columns. But the 'SD_ID' columns in tabl= e 'TBLS' and 'COLUMNS' are unbalanced in count, as shown below: mysql> select count(distinct SD_ID) from tbls; +-----------------------+ | count(distinct SD_ID) | +-----------------------+ | 764 | +-----------------------+ 1 row in set (0.00 sec) mysql> select count(distinct SD_ID) from columns; +-----------------------+ | count(distinct SD_ID) | +-----------------------+ | 5219 | +-----------------------+ 1 row in set (0.05 sec) Is that mean table 'columns' contains garbage data? If so, then how it is g= enerated? -- Best Regards, Ted Xu --_000_68B7689C98024D43B4C2709456F0B5200A612F0463SCMBXC1TheFac_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Do you have partitions in the table? Storage descr= iptors=20 can also be associated with partitions.
 
Ashish


From: Ted Xu [mailto:ted.xu.ml@gmail.co= m]=20
Sent: Wednesday, May 26, 2010 5:26 AM
To:=20 hive-user@hadoop.apache.org
Subject: Garbage data in metadata=20 store?

Hi all,

I want to replicate hive metadata to another place, while I found my h= ive=20 metadata contains a big portion of data looks like garbage.

In my understanding, the hive metadata store use 'Storage Descriptor' = to=20 keep relationship between tables and columns. But the 'SD_ID' col= umns=20 in table 'TBLS' and 'COLUMNS' are unbalanced in count, as shown below:

mysql> select count(distinct SD_ID) from tbls;
+-----------------------+
| count(distinct SD_ID) |
+-----------------------+
|                   764=20 | 
+-----------------------+
1 row in set (0.00 sec)

mysql> select count(distinct SD_ID) from columns;
+-----------------------+
| count(distinct SD_ID) |
+-----------------------+
|                  5219=20 | 
+-----------------------+
1 row in set (0.05 sec)

Is that mean table 'columns' contains garbage data? If so, then how it= is=20 generated? 

--
Best Regards,
Ted Xu
--_000_68B7689C98024D43B4C2709456F0B5200A612F0463SCMBXC1TheFac_--