hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <Guy.Doulb...@conduit.com>
Subject RE: ‏Sequence file- custom serdes - question
Date Mon, 17 Jan 2011 15:31:28 GMT
Thanks,

I eventually did it in the following way:

If next (the method of RecordReader) returns true, than it has now the current key and the
current value.

I made my value implement the interface:
ValueHoldsKey<K>

K getKey();
Void setKey(K k);


Than I changed the wrapper to the following:

public class CombinedSequenceRecordReader<K extends Writable,V extends ValueHoldKey<K>
 > implements RecordReader<K, V>

And changed the code of the next to

	@Override
	public boolean next(K key, V value) throws IOException {
		boolean retVal = proxy.next(key, value);
		if (retVal){
			value.setKey(key);
		}
		return retVal;
	}


Now in the custom serde I can use my getKey method

Hope that helps someones


-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com] 
Sent: Monday, January 17, 2011 4:36 PM
To: user@hive.apache.org
Subject: Re: ‏Sequence file- custom serdes - question

On Mon, Jan 17, 2011 at 9:20 AM, Guy Doulberg <Guy.Doulberg@conduit.com> wrote:
> Thanks Eduard,
>
> But I don't understand your suggestion,
>
> How do I convert the custom object that I have to text?
>
> An where?
> In the createValue method?
>
> Thanks again
>
> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Monday, January 17, 2011 4:13 PM
> To: user@hive.apache.org
> Subject: Re: ‏Sequence file- custom serdes - question
>
> 2011/1/17 Guy Doulberg <Guy.Doulberg@conduit.com>:
>> Hey again,
>>
>> I thought it will be easy to combine the key and the value, however I ran into difficulties,
I wonder if someone has make a generic FileInputFormat that prepend the key to the value?
>>
>> Anyhow here is the code I am trying to write:
>>
>> I have a class that extends the SequenceFileInputFormat
>>
>> public class CombinedSequenceFileInputFormat<K extends Writable,V extends Writable
> extends SequenceFileInputFormat<K, V> {
>>
>>
>>    @Override
>>    public org.apache.hadoop.mapred.RecordReader<K, V> getRecordReader(
>>            org.apache.hadoop.mapred.InputSplit split, JobConf job,
>>            Reporter reporter) throws IOException {
>>        // TODO Auto-generated method stub
>>
>>        CombinedSequenceRecordReader<K, V> wrap =  new CombinedSequenceRecordReader<K,
V>(super.getRecordReader(split, job, reporter));
>>
>>        return wrap;
>>    }
>>
>> }
>>
>> And then I return the wrapped recrodReader and the code of that wrapper is:
>>
>> public class CombinedSequenceRecordReader<K extends Writable,V > implements
RecordReader<K, V> {
>>
>>    private RecordReader<K, V> proxy;
>>    private K currentKey;
>>
>>    public CombinedSequenceRecordReader(RecordReader<K, V> proxy){
>>        this.proxy = proxy;
>>    }
>>
>>    public void setProxy(RecordReader<K, V> proxy) {
>>        this.proxy = proxy;
>>    }
>>
>>    public RecordReader<K, V> getProxy() {
>>        return proxy;
>>    }
>>
>>    @Override
>>    public boolean next(K key, V value) throws IOException {
>>
>>        return proxy.next(key, value);
>>    }
>>
>>    @Override
>>    public K createKey() {
>>        currentKey = proxy.createKey() ;
>>        return currentKey;
>>    }
>>
>>    @Override
>>    public V createValue() {
>>        V val = proxy.createValue();
>>        return val;
>>    }
>>
>>    @Override
>>    public long getPos() throws IOException {
>>        // TODO Auto-generated method stub
>>        return proxy.getPos();
>>    }
>>
>>    @Override
>>    public void close() throws IOException {
>>        proxy.close();
>>
>>    }
>>
>>    @Override
>>    public float getProgress() throws IOException {
>>        // TODO Auto-generated method stub
>>        return proxy.getProgress();
>>    }
>>
>>
>>
>> }
>>
>>
>> Now I am trying to extend the createValue in such a way that I will have also the
key, any suggestions?
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> Sent: Sunday, January 16, 2011 10:33 PM
>> To: user@hive.apache.org
>> Subject: Re: ‏Sequence file- custom serdes - question
>>
>> 2011/1/16 Guy Doulberg <Guy.Doulberg@conduit.com>:
>>> Hey all,
>>>
>>> I am new to this hive thing, but I have a very complex task to perform, I am
a little stuck. I hope someone here can help.
>>>
>>> My team has been storing data to a custom sequence file that has a custom key
and a custom value. We want to expose a hive interface to query this data.
>>> I have been trying to write a custom SerDe that deserialize  the sequence file
to the a hive table.
>>>
>>> As long as I needed values from the value part of the object everything was all-right,
but when I needed to extract a value from the key-part, I got stuck, suddenly I realized that
in the method of the deserialize(Writeable o), o is instance of the value class, and I don't
know how I can access the key object.
>>>
>>> It could be I am missing something in the configuration in the java code or declaration
 in the HIVE.
>>>
>>>
>>>
>>> Thanks,
>>> Guy
>>>
>>>
>>>
>>>
>>>
>>
>> Hive ignores then Key! (I know how crazy right) What I have done is
>> used my InputFormat to combine the key and the value and make the
>> combined field the value.
>>
>
> This approach should work. A simple approach is to convert the your
> custom Writable to Text at this point.
>
> source:    Writable A( name:car type:ford) Writable B ( windows:4)
> InputFormat(Result):    Byte[0],"car\tford\t4"
>
> From this point you can just use hive delimited Serde as normal.
>
> If your source input is setup in such a way that you can not decode it
> in the InputFormat stage you probably need to write your own Serde as
> the serde will have access to the hive table information and the
> Source data.
>

If you know the type of your Key and value, you can cast them into a
known type then write some type of toString() on them.

I do this when I know K and V are ALWAYS Text,Text

However this is short cutting the process a bit. Your input format
should return Key Value objects and the SerDe is supposed to
interrogate the data from them, but in some cases you do not need a
InputFormat and a Serde just one or the other.
Mime
View raw message