hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Xu <ted.xu...@gmail.com>
Subject Re: Garbage data in metadata store?
Date Thu, 27 May 2010 03:43:44 GMT
Hi Ashish,

Thank you for your reply, that explains my problem.

I also find the columns related to a certain partition is identical to the
columns which related to other partitions in the same table. So what is
the benefit for such a redundant design?

2010/5/27 Ashish Thusoo <athusoo@facebook.com>

>  Do you have partitions in the table? Storage descriptors can also be
> associated with partitions.
>
> Ashish
>
>  ------------------------------
> *From:* Ted Xu [mailto:ted.xu.ml@gmail.com]
> *Sent:* Wednesday, May 26, 2010 5:26 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Garbage data in metadata store?
>
> Hi all,
>
> I want to replicate hive metadata to another place, while I found my hive
> metadata contains a big portion of data looks like garbage.
>
> In my understanding, the hive metadata store use 'Storage Descriptor' to
> keep relationship between tables and columns. But the 'SD_ID' columns in
> table 'TBLS' and 'COLUMNS' are unbalanced in count, as shown below:
>
> mysql> select count(distinct SD_ID) from tbls;
> +-----------------------+
> | count(distinct SD_ID) |
> +-----------------------+
> |                   764 |
> +-----------------------+
> 1 row in set (0.00 sec)
>
> mysql> select count(distinct SD_ID) from columns;
> +-----------------------+
> | count(distinct SD_ID) |
> +-----------------------+
> |                  5219 |
> +-----------------------+
> 1 row in set (0.05 sec)
>
> Is that mean table 'columns' contains garbage data? If so, then how it is
> generated?
>
> --
> Best Regards,
> Ted Xu
>



-- 
Best Regards,
Ted Xu

Mime
View raw message