hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Looking at the columns table
Date Wed, 11 Apr 2012 14:53:04 GMT
Hey all. Our metastore in mysql is fairly large over 12GB. All the
storage here is the columns table. It seems that each column is stored
for each partition/storage descriptor as a one-many relationship.

In our case all the partitions have the same column definition. My
thinking. Should the relationship from columns->partition/storage
descriptor be a many<->many? In this way we only store the column once
and the current column table can reference the primary key of this
column. This should bring the size of this table down really
drastically.

Since every other table in the metastore is so small this huge columns
table looks like the only scalability choke point we have.

Edward

Mime
View raw message