nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: UnpackContent processor cannot unpack gz file
Date Wed, 12 Aug 2015 02:37:51 GMT
Hello

The UnpackContent is for dealing with archive formats (tar, zip, etc..).

If your file is a compression format (as is the case with the part-0002.gz
file) then you first need to run it through 'CompressContent' in
'decompress' mode.  You can even first run it through 'IdentifyMimeType'
and set up a flow to handle arbitrarily complicated layers of
compression/archive structures.

So for this case:

- GetHDFS (or ListHDFS and FetchHDFS)
- CompressContent (in decompress mode)

Now you have your text oriented file ready to be dealt with.  If you
perhaps want to deal with each line individually you can use
- SplitText (line split count of 1)

Thanks
Joe

On Tue, Aug 11, 2015 at 9:27 PM, 彭光裕 <rolandpeng@cht.com.tw> wrote:

> hi,
>
>      I have a compressed file got from GetHDFS processor and to be
> unpacked by using UnpackContent processor, I have already set the
> UnpackContent processor property packaging format to 'tar', but an error
> like below always takes place.
>
>
>
> The error logs is attached below (Unable to unpack StandardFlowFileRecord)
>
>
>
> 2015-08-11 07:10:52,291 ERROR [Timer-Driven Process Thread-4]
> o.a.n.processors.standard.UnpackContent
> UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03] Unable to unpack
> StandardFlowFileRecord[uuid=85b7d53b-3183-4c48-9160-b2e714b5eaa8,claim=1439248247840-1,offset=0,name=part-00002.gz,size=59212170]
> due to org.apache.nifi.processor.exception.ProcessException: IOException
> thrown from UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03]:
> java.io.IOException: Error detected parsing the header; routing to failure:
> org.apache.nifi.processor.exception.ProcessException: IOException thrown
> from UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03]:
> java.io.IOException: Error detected parsing the header
>
>
>
>     My compressed file is named part-00002.gz, and you can access the file
> here: https://dl.dropboxusercontent.com/u/24808937/part-00002.gz
>
>      Any advice would be welcome. Please help how to solve this problem,
> thank you!
>
>
>
> Roland
>
>
>
> *本信件可能包含中華電信股份有限公司機密資訊,非指定之收件者,請勿蒐集、處理或利用本信件內容,並請銷毀此信件.
> 如為指定收件者,應確實保護郵件中本公司之營業機密及個人資料,不得任意傳佈或揭露,並應自行確認本郵件之附檔與超連結之安全性,以共同善盡資訊安全與個資保護責任.
> Please be advised that this email message (including any attachments)
> contains confidential information and may be legally privileged. If you are
> not the intended recipient, please destroy this message and all attachments
> from your system and do not further collect, process, or use them. Chunghwa
> Telecom and all its subsidiaries and associated companies shall not be
> liable for the improper or incomplete transmission of the information
> contained in this email nor for any delay in its receipt or damage to your
> system. If you are the intended recipient, please protect the confidential
> and/or personal information contained in this email with due care. Any
> unauthorized use, disclosure or distribution of this message in whole or in
> part is strictly prohibited. Also, please self-inspect attachments and
> hyperlinks contained in this email to ensure the information security and
> to protect personal information.*
>

Mime
View raw message