hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruben de Vries <ruben.devr...@hyves.nl>
Subject RE: using the key from a SequenceFile
Date Thu, 19 Apr 2012 12:21:13 GMT
Hive can handle a sequence file just like a text file, only it omits the key completely and
only uses the value part of it, other than that you won't notice the difference between sequence
or plain text file

From: David Kulp [mailto:dkulp@fiksu.com]
Sent: Thursday, April 19, 2012 2:13 PM
To: user@hive.apache.org
Subject: Re: using the key from a SequenceFile

I'm trying to achieve something very similar.  I want to write an MR program that writes results
in a record-based sequencefile that would be directly readable from hive as though it were
created using "STORED AS SEQUENCEFILE" with, say, BinarySortableSerDe.

>From this discussion it seems that Hive does not / cannot take advantage of the key/values
in a sequencefile, but rather it requires a value that is serialized using a SerDe.  Is that
right?

If so, does that mean that the right approach is to using the BinarySortableSerDe to pass
the collector a row's worth of data as the Writable value.  And would Hive "just work" on
such data?

If SequencefileOutputFormat is used, will it automatically place sync markers in the file
to allow for file splitting?

Thanks!


(ps. As an aside, Avro would be better.  Wouldn't it be a huge win for MapReduce to have an
AvroOutputFileFormat and for Hive to have a serde that read such files?  It seems like there's
a natural correspondence between the richer data representations of an SQL schema and an Avro
schema, and there's already code for working with Avro in MR as input.)



On Apr 19, 2012, at 6:15 AM, madhu phatak wrote:


Serde will allow you to create custom data from your sequence File  https://cwiki.apache.org/confluence/display/Hive/SerDe
On Thu, Apr 19, 2012 at 3:37 PM, Ruben de Vries <ruben.devries@hyves.nl<mailto:ruben.devries@hyves.nl>>
wrote:
I'm trying to migrate a part of our current hadoop jobs from normal mapreduce jobs to hive,
Previously the data was stored in sequencefiles with the keys containing valueable data!
However if I load the data into a table I loose that key data (or at least I can't access
it with hive), I want to somehow use the key from the sequence file in hive.

I know this has come up before since I can find some hints of people needing it but I can't
seem to find a working solution and since I'm not very good with java I really can't get it
done myself :(.
Does anyone have a snippet of something like this working?

I get errors like;
../hive/mapred/CustomSeqRecordReader.java:14: cannot find symbol
    [javac] symbol  : constructor SequenceFileRecordReader()
    [javac] location: class org.apache.hadoop.mapred.SequenceFileRecordReader<K,V>
    [javac] public class CustomSeqRecordReader<K, V> extends SequenceFileRecordReader<K,
V> implements RecordReader<K, V> {


Hope some1 has a snippet or can help me out, would really love to be able to switch part of
our jobs to hive,


Ruben de Vries



--
https://github.com/zinnia-phatak-dev/Nectar


Mime
View raw message