hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Connell, Chuck" <>
Subject RE: Starting with Hive - writing custom SerDe
Date Thu, 29 Nov 2012 14:53:07 GMT
I meant PLAIN tab-separated text.

From: Connell, Chuck []
Sent: Thursday, November 29, 2012 9:51 AM
Subject: RE: Starting with Hive - writing custom SerDe

You might save yourself a lot of work by pre-processing the data, before putting it into Hive.
A Python script should be able to find all the fields, and change the data to plan tab-separated
text. This will load directly into Hive, and removes the need to write a custom SerDe.

Chuck Connell
Nuance R&D Data Team
Burlington, MA

From: Fernando Andrés Doglio Turissini []
Sent: Thursday, November 29, 2012 8:39 AM
Subject: Starting with Hive - writing custom SerDe

Hello everyone, I'm starting to play around with Hive, and I have to load a traffic data log
file into a table. My problem is that the lines of the file don't really have a nice separator
for each field (on the same line, there are serveral blank or hyphens or single blank spaces
used as separators)...
So after looking around for a while, I found that I have to write a custom SerDe in order
to tell Hive how to parse those lines.

I've also found that I can only write them using Java (unlike UDFs for pig for instance, which
can be written using other languages), is this correct?
Furthermore, I wanted to know if anyone can point me into the direction of some sort of documentation
 that describes the process of writing a SerDe. I've found examples around the internet, but
none of them explain what exactly is each method supposed to do (I'm talking about the methods
supplied by the SerDe interface).

Thanks in advance!


View raw message