hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Sukmanowsky <>
Subject Custom InputFormat for Multiline Input File Hive/Hadoop
Date Sat, 08 Oct 2011 00:47:29 GMT
Hi all,

Sending this to and

Trying to process Omniture's data log files with Hadoop/Hive. The file
format is tab delimited and while being pretty simple for the most part,
they do allow you to have multiple new lines and tabs within a field that
are escaped by a backslash (\\n and \\t). As a result I've opted to create
my own InputFormat to handle the multiple newlines and convert those tabs to
spaces when Hive is going to try to do a split on the tabs.

I've found a fairly good reference for doing this using the newer
InputFormat API at but unfortunately my version
of Hive (0.7.0) still uses the old InputFormat API.

I haven't been able to find many tutorials on writing a custom InputFile
using the older API so I'm looking to see if I can get some guidance as to
what may be wrong with the following two classes:

The SELECT statements within hive currently return nothing and my other
variations returned nothing but NULL values.

This issue is also available on StackOverflow at

If there's a resource someone can point me to that'd also be great.

Many thanks in advance,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message