apex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priyanka Gugale <pri...@apache.org>
Subject Re: Reading compressed file using FileSplitter
Date Mon, 03 Oct 2016 10:36:31 GMT
Hi Chiranjeevi,

There is no direct support in current operators to decompress data read
from file. But you can do it in following ways:
1. Extend AbstractBlockReader to use right STREAM type by implementing
`setupStream` function to initialize right stream reader class. e.g.
gzipInputStream if your input was in gzip format. Or in your case
"SnappyInputStream".
2. Override `readBlock` from AbstractBlockReader and call decompress on
input data using snappy java api and then emit the data.

I would suggest the option one but what is achievable depends on which
snappy java library you use. Can you tell us which library you are using?

-Priyanka

On Mon, Oct 3, 2016 at 2:42 PM, chiranjeevi vasupilli <chiru.vcj@gmail.com>
wrote:

> Hi Priyanka,
>
> We are getting compressed file from source, which we need to read and
> decompress it. So that we can process the actual data.
>
> Can you please provide any reader/Operator which is readily available to decompress
> the data  while reading data in DataTorrent?
>
>
>
> On Mon, Oct 3, 2016 at 1:07 PM, Priyanka Gugale <priyag@apache.org> wrote:
>
>> Hi,
>>
>> Do you want to read files in compressed form only or you want to your
>> program to decompress and read it?
>> If you want to read it in compressed format you can use FSInputModule
>> (which uses FileSplitter and block reader) directly to read your files.
>> If you want to uncompress while reading, there are other options you can
>> choose. I will explain in detail once you confirm this is what you are
>> trying to achieve.
>>
>> -Priyanka
>>
>> On Mon, Oct 3, 2016 at 12:38 PM, chiranjeevi vasupilli <
>> chiru.vcj@gmail.com> wrote:
>>
>>> Hi Team,
>>>
>>> Can you please provide any reader/Operator which is capable of reading
>>> the compressed data in DataTorrent.
>>>
>>> I have a requirement to read .snappy files having cntl+A separaor using
>>> filesplitter ,can u please let me know how to do it?
>>>
>>>
>>> --
>>> thanks
>>> chiru
>>>
>>
>>
>
>
> --
> ur's
> chiru
>

Mime
View raw message