hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varsha Raveendran <varsha.raveend...@gmail.com>
Subject Fwd: Custom Input Format for Sequence Files
Date Fri, 13 Feb 2015 16:58:45 GMT
Hello!

I have csv files which are small in size which are moved  to the HDFS using
the SequenceFile Format. The key is the file name and contents of the file
becomes the value.

Now I want to create an external table on these csv files using HIVE. But
when I do I get only the first row of each csv file.

For example,

Assume the csv files contain three columns - Col1, Col2, Col3 and I have 3
CSV files - File1, File2, File3.

File 1
10,20,30
40,50,60,
70,80,90

File2
100,110,120
130,140,150
160,170,180

File3
200,210,220
230,240,250
260,270,280


A sequence file is created -
File1 <Contents of  File1>
File2 <Contents of  File2>
File3 <Contents of  File3>

Now when I create an external table Stored as SEQUENCEFILE and do a SELECT
ALL query using HIVE I get the following result
10     20      30
100   110    120
200    210    220

I am aware that I need to write a custom inputformat, custom recordreader
and custom serde. Also, a sequence file treats one key-value pair as one
row.
I dont understand how to split one row (corresponding to one value) of a
sequence file into multiple rows in a HIVE table.

Any suggestions on how to go about this?

Regards,
VR

Mime
View raw message