hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: ‏Sequence file- custom serdes - question
Date Mon, 17 Jan 2011 14:12:55 GMT
2011/1/17 Guy Doulberg <Guy.Doulberg@conduit.com>:
> Hey again,
>
> I thought it will be easy to combine the key and the value, however I ran into difficulties,
I wonder if someone has make a generic FileInputFormat that prepend the key to the value?
>
> Anyhow here is the code I am trying to write:
>
> I have a class that extends the SequenceFileInputFormat
>
> public class CombinedSequenceFileInputFormat<K extends Writable,V extends Writable
> extends SequenceFileInputFormat<K, V> {
>
>
>    @Override
>    public org.apache.hadoop.mapred.RecordReader<K, V> getRecordReader(
>            org.apache.hadoop.mapred.InputSplit split, JobConf job,
>            Reporter reporter) throws IOException {
>        // TODO Auto-generated method stub
>
>        CombinedSequenceRecordReader<K, V> wrap =  new CombinedSequenceRecordReader<K,
V>(super.getRecordReader(split, job, reporter));
>
>        return wrap;
>    }
>
> }
>
> And then I return the wrapped recrodReader and the code of that wrapper is:
>
> public class CombinedSequenceRecordReader<K extends Writable,V > implements RecordReader<K,
V> {
>
>    private RecordReader<K, V> proxy;
>    private K currentKey;
>
>    public CombinedSequenceRecordReader(RecordReader<K, V> proxy){
>        this.proxy = proxy;
>    }
>
>    public void setProxy(RecordReader<K, V> proxy) {
>        this.proxy = proxy;
>    }
>
>    public RecordReader<K, V> getProxy() {
>        return proxy;
>    }
>
>    @Override
>    public boolean next(K key, V value) throws IOException {
>
>        return proxy.next(key, value);
>    }
>
>    @Override
>    public K createKey() {
>        currentKey = proxy.createKey() ;
>        return currentKey;
>    }
>
>    @Override
>    public V createValue() {
>        V val = proxy.createValue();
>        return val;
>    }
>
>    @Override
>    public long getPos() throws IOException {
>        // TODO Auto-generated method stub
>        return proxy.getPos();
>    }
>
>    @Override
>    public void close() throws IOException {
>        proxy.close();
>
>    }
>
>    @Override
>    public float getProgress() throws IOException {
>        // TODO Auto-generated method stub
>        return proxy.getProgress();
>    }
>
>
>
> }
>
>
> Now I am trying to extend the createValue in such a way that I will have also the key,
any suggestions?
>
>
>
>
>
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Sunday, January 16, 2011 10:33 PM
> To: user@hive.apache.org
> Subject: Re: ‏Sequence file- custom serdes - question
>
> 2011/1/16 Guy Doulberg <Guy.Doulberg@conduit.com>:
>> Hey all,
>>
>> I am new to this hive thing, but I have a very complex task to perform, I am a little
stuck. I hope someone here can help.
>>
>> My team has been storing data to a custom sequence file that has a custom key and
a custom value. We want to expose a hive interface to query this data.
>> I have been trying to write a custom SerDe that deserialize  the sequence file to
the a hive table.
>>
>> As long as I needed values from the value part of the object everything was all-right,
but when I needed to extract a value from the key-part, I got stuck, suddenly I realized that
in the method of the deserialize(Writeable o), o is instance of the value class, and I don't
know how I can access the key object.
>>
>> It could be I am missing something in the configuration in the java code or declaration
 in the HIVE.
>>
>>
>>
>> Thanks,
>> Guy
>>
>>
>>
>>
>>
>
> Hive ignores then Key! (I know how crazy right) What I have done is
> used my InputFormat to combine the key and the value and make the
> combined field the value.
>

This approach should work. A simple approach is to convert the your
custom Writable to Text at this point.

source:    Writable A( name:car type:ford) Writable B ( windows:4)
InputFormat(Result):    Byte[0],"car\tford\t4"

>From this point you can just use hive delimited Serde as normal.

If your source input is setup in such a way that you can not decode it
in the InputFormat stage you probably need to write your own Serde as
the serde will have access to the hive table information and the
Source data.

Mime
View raw message