hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <>
Subject How to import extremely "wide" csv tables
Date Tue, 13 Mar 2012 17:03:28 GMT
Wrapping hive around existing csv files consists of manually naming and typing every column
during the creation command.  I have several csv tables and some of them have a ton of columns.
 I would love a way to create hive tables which automatically infers the column types by attempting
various type conversions or regex matches on the data (say the first row).  What would be
even cooler is if the first row could actually be interpreted differently from the rest of
the a set of string labels to name the columns while the types could be automatically
inferred from, say, the *second* row.  These csv files are currently of this format, with
the first row naming the columns.

Does this make sense?

Now, I'm sure that hive doesn't support this yet -- and I admit it is a somewhat esoteric
desire on my part -- but I'm curious how others would suggest approaching it?  I'm thinking
of writing a separate isolated program that reads the first two rows of a csv file and dumps
a text string of column names and types in the correct syntax for a hive external table creation
statement which I would then copy/paste into hive...I was just hoping for a simpler solution.



Keith Wiley

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
                                           --  Keith Wiley

View raw message