hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Beveridge <dbeveri...@cylance.com>
Subject Altering the Metastore on EC2
Date Tue, 12 Aug 2014 00:52:52 GMT
We are creating an Hive schema for reading massive JSON files. Our JSON schema is rather large,
and we have found that the default metastore schema for Hive cannot work for us as-is.

To be specific, one field in our schema has about 17KB of nested structs within it. Unfortunately,
it appears that Hive has a limit of varchar(4000) for the field that stores the resulting
definition:

    CREATE TABLE "COLUMNS_V2" (
    "CD_ID" bigint NOT NULL,
    "COMMENT" varchar(4000),
    "COLUMN_NAME" varchar(128) NOT NULL,
    "TYPE_NAME" varchar(4000),
    "INTEGER_IDX" INTEGER NOT NULL,
    PRIMARY KEY ("CD_ID", "COLUMN_NAME")
    );

We are running this on Amazon MapReduce (v0.11 with default Derby metastore)

So, our initial questions are:

*         Is there a reason that the TYPE_NAME is being limited to 4000 (IIUC, varchar on
derby can grow to 32672, which would be sufficient for a long time)

*         Can we alter the metastore schema without hacking/reinstalling Hive? (if so, how?)

*         If so, is there a proper way to update the schema on all nodes?


Thanks in advance!
--DB

Mime
View raw message