hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From phil young <phil.wills.yo...@gmail.com>
Subject Re: Custom SerDe Question
Date Fri, 28 Jan 2011 23:00:17 GMT
Ahh, not as custom as I expected...that makes sense now.

Glad things are working for you.

-Phil


On Fri, Jan 28, 2011 at 5:34 PM, Christopher, Pat <
patrick.christopher@hp.com> wrote:

> Not sure what I did wrong the first time but I tried to create a table with
> stored type of textfile and using my custom serde so it had a format line
> of:
>
>
>
>   ROW FORMAT SERDE ‘org.myorg.hadoop.hive.udf.MySerDe’ STORED AS textfile
>
>
>
> Then I loaded a gzipped file using LOAD DATA LOCAL INPATH ‘path.gz’ INTO
> TABLE mytable and it worked as expected, ie the file was read and I’m able
> to query it using hive.
>
>
>
> Sorry to bother and thanks a bunch for the help!  Forcing me to go read
> more about InputFormats is a long term help anyway.
>
>
>
> Pat
>
>
>
> *From:* phil young [mailto:phil.wills.young@gmail.com]
> *Sent:* Friday, January 28, 2011 1:54 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Custom SerDe Question
>
>
>
> To be clear, you would then create the table with the clause:
>
>
>
> STORED AS
>
>   INPUTFORMAT 'your.custom.input.format'
>
>
>
>
>
> If you make an external table, you'll then be able to point to a directory
> (or file) that contains gzipped files, or uncompressed files.
>
>
>
>
>
>
>
> On Fri, Jan 28, 2011 at 4:52 PM, phil young <phil.wills.young@gmail.com>
> wrote:
>
> This can be accomplished with a custom input format.
>
>
>
> Here's a snippet of the relevant code in the customer RecordReader
>
>
>
>
>
>             compressionCodecs = new CompressionCodecFactory(jobConf);
>
>             Path file = split.getPath();
>
>             final CompressionCodec codec =
> compressionCodecs.getCodec(file);
>
>             // open the file and seek to the start of the split
>
>             start = split.getStart();
>
>             end = start + split.getLength();
>
>             pos=0;
>
>
>
>             FileSystem fs = file.getFileSystem(jobConf);
>
>             fsdat = fs.open(split.getPath());
>
>             fsdat.seek(start);
>
>
>
>             if (codec != null)
>
>             {
>
>                 fsin = codec.createInputStream(fsdat);
>
>             }
>
>             else
>
>             {
>
>                 fsin = fsdat;
>
>             }
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Jan 28, 2011 at 1:57 PM, Christopher, Pat <
> patrick.christopher@hp.com> wrote:
>
> Hi,
>
> I’ve written a SerDe and I’d like it to be able handle compressed data
> (gzip).  Hadoop detects and decompresses on the fly so if you have a
> compressed data set and you don’t need to perform any custom interpretation
> of it as you go, hadoop and hive will handle it.  Is there a way to get Hive
> to notice the data is compressed, decompress it then push it through the
> custom SerDe?  Or will I have to either
>
>   a. add some decompression logic to my SerDe (possibly impossible)
>
>   b. decompress the data before pushing it into a table with my SerDe
>
>
>
> Thanks!
>
>
>
> Pat
>
>
>
>
>

Mime
View raw message