hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad Chakka <>
Subject Re: Garbage data in metadata store?
Date Thu, 27 May 2010 04:04:01 GMT
Idea is to allow schema evolution. Old partitions retain old schema but new partitions can
change schema (including INPUT/OUTPUT format, serde etc). I think some of this is already

On May 26, 2010, at 8:43 PM, Ted Xu wrote:

Hi Ashish,

Thank you for your reply, that explains my problem.

I also find the columns related to a certain partition is identical to the columns which related
to other partitions in the same table. So what is the benefit for such a redundant design?

2010/5/27 Ashish Thusoo <<>>
Do you have partitions in the table? Storage descriptors can also be associated with partitions.


From: Ted Xu [<>]
Sent: Wednesday, May 26, 2010 5:26 AM
Subject: Garbage data in metadata store?

Hi all,

I want to replicate hive metadata to another place, while I found my hive metadata contains
a big portion of data looks like garbage.

In my understanding, the hive metadata store use 'Storage Descriptor' to keep relationship
between tables and columns. But the 'SD_ID' columns in table 'TBLS' and 'COLUMNS' are unbalanced
in count, as shown below:

mysql> select count(distinct SD_ID) from tbls;
| count(distinct SD_ID) |
|                   764 |
1 row in set (0.00 sec)

mysql> select count(distinct SD_ID) from columns;
| count(distinct SD_ID) |
|                  5219 |
1 row in set (0.05 sec)

Is that mean table 'columns' contains garbage data? If so, then how it is generated?

Best Regards,
Ted Xu

Best Regards,
Ted Xu

View raw message