hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject Seekable interface and CompressInputStream question
Date Sat, 22 Dec 2012 02:13:19 GMT

I have a question related to Seekable interface. Right now I am using the CDH3 release, with
hadoop 0.20.2. I understand in it, the CompressInputStream will throw UnsupportedException
in methods inherited from Seekable interface, as they are not implemented.
My question is that does Seekable mean the underline InputStream will support Split? As if
an InputStream can be seekable, then it should be able to split, right?
If so, in the future release, I assume that CompressInputStream will implement Seekable in
hadoop. But my understand is that some compression can be split, some cannot. If the data
file is gzip file, and let's say that I get a CompressInputStream does support Seekable, with
codec of Gzip codec, I will assume it is Splitable, but in fact it isn't. How do I write a
generic InputFormat to support both splitable/unsplitable compress input stream in this case?
Or my understanding is not correct, that Seekable and Split are totally different things?
View raw message