Return-Path: X-Original-To: apmail-nifi-users-archive@minotaur.apache.org Delivered-To: apmail-nifi-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8AE6518424 for ; Wed, 12 Aug 2015 02:38:12 +0000 (UTC) Received: (qmail 60379 invoked by uid 500); 12 Aug 2015 02:38:11 -0000 Delivered-To: apmail-nifi-users-archive@nifi.apache.org Received: (qmail 60351 invoked by uid 500); 12 Aug 2015 02:38:11 -0000 Mailing-List: contact users-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@nifi.apache.org Delivered-To: mailing list users@nifi.apache.org Received: (qmail 60339 invoked by uid 99); 12 Aug 2015 02:38:11 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Aug 2015 02:38:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 922B11A9D8B for ; Wed, 12 Aug 2015 02:38:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id bGGdcqE3Y2T5 for ; Wed, 12 Aug 2015 02:37:57 +0000 (UTC) Received: from mail-ig0-f175.google.com (mail-ig0-f175.google.com [209.85.213.175]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 7426E24AC5 for ; Wed, 12 Aug 2015 02:37:57 +0000 (UTC) Received: by igbjg10 with SMTP id jg10so34653123igb.0 for ; Tue, 11 Aug 2015 19:37:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+WbU9D+4U5MfVdAUh2Ny4ltIy/wESfsvHNkgK94aaFA=; b=Q0iUUlGgLEBZ0rfgL5EA8oi3eq8qXw4B/l2FPS2OVsn6KMTvUdY3RcKpHG83mf5UDp /dpqr4rPOZ5tekViXVqlT+kx+rNQzdONUyl9arYOMDTv34KYDzii+Fjgi279aHmZ1RiM lp+5fK8B/A5XK34pj0on8Q3ROeXM7srungK4RLtgFFvZZgAY+irwK31vzR4NJ2j1kUg3 +Y+XsR7uYxl7TDZw5PzPcvofvqFdiKYQw/Na8dkXRe5uVRIuTHePm5qGyNV3T0UW0MaF 42EbVmmajEU3rIReegrhRvmKtcvpCpnECQoiBfRxIiiMRZCnI7R/ljUisANjcHDWUueL s5GQ== MIME-Version: 1.0 X-Received: by 10.50.143.37 with SMTP id sb5mr21498041igb.62.1439347071345; Tue, 11 Aug 2015 19:37:51 -0700 (PDT) Received: by 10.36.206.6 with HTTP; Tue, 11 Aug 2015 19:37:51 -0700 (PDT) In-Reply-To: <52126615B467374D87D215089F7462786C3F8712@mbs5.app.corp.cht.com.tw> References: <52126615B467374D87D215089F7462786C3F8712@mbs5.app.corp.cht.com.tw> Date: Tue, 11 Aug 2015 21:37:51 -0500 Message-ID: Subject: Re: UnpackContent processor cannot unpack gz file From: Joe Witt To: users@nifi.apache.org Content-Type: multipart/related; boundary=001a1135fe4ee5b247051d141aa5 --001a1135fe4ee5b247051d141aa5 Content-Type: multipart/alternative; boundary=001a1135fe4ee5b242051d141aa4 --001a1135fe4ee5b242051d141aa4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello The UnpackContent is for dealing with archive formats (tar, zip, etc..). If your file is a compression format (as is the case with the part-0002.gz file) then you first need to run it through 'CompressContent' in 'decompress' mode. You can even first run it through 'IdentifyMimeType' and set up a flow to handle arbitrarily complicated layers of compression/archive structures. So for this case: - GetHDFS (or ListHDFS and FetchHDFS) - CompressContent (in decompress mode) Now you have your text oriented file ready to be dealt with. If you perhaps want to deal with each line individually you can use - SplitText (line split count of 1) Thanks Joe On Tue, Aug 11, 2015 at 9:27 PM, =E5=BD=AD=E5=85=89=E8=A3=95 wrote: > hi, > > I have a compressed file got from GetHDFS processor and to be > unpacked by using UnpackContent processor, I have already set the > UnpackContent processor property packaging format to 'tar', but an error > like below always takes place. > > > > The error logs is attached below (Unable to unpack StandardFlowFileRecord= ) > > > > 2015-08-11 07:10:52,291 ERROR [Timer-Driven Process Thread-4] > o.a.n.processors.standard.UnpackContent > UnpackContent[id=3Db90c65e1-b97f-3b4b-9e37-6223afa1ef03] Unable to unpack > StandardFlowFileRecord[uuid=3D85b7d53b-3183-4c48-9160-b2e714b5eaa8,claim= =3D1439248247840-1,offset=3D0,name=3Dpart-00002.gz,size=3D59212170] > due to org.apache.nifi.processor.exception.ProcessException: IOException > thrown from UnpackContent[id=3Db90c65e1-b97f-3b4b-9e37-6223afa1ef03]: > java.io.IOException: Error detected parsing the header; routing to failur= e: > org.apache.nifi.processor.exception.ProcessException: IOException thrown > from UnpackContent[id=3Db90c65e1-b97f-3b4b-9e37-6223afa1ef03]: > java.io.IOException: Error detected parsing the header > > > > My compressed file is named part-00002.gz, and you can access the fil= e > here: https://dl.dropboxusercontent.com/u/24808937/part-00002.gz > > Any advice would be welcome. Please help how to solve this problem, > thank you! > > > > Roland > > > > *=E6=9C=AC=E4=BF=A1=E4=BB=B6=E5=8F=AF=E8=83=BD=E5=8C=85=E5=90=AB=E4=B8=AD= =E8=8F=AF=E9=9B=BB=E4=BF=A1=E8=82=A1=E4=BB=BD=E6=9C=89=E9=99=90=E5=85=AC=E5= =8F=B8=E6=A9=9F=E5=AF=86=E8=B3=87=E8=A8=8A,=E9=9D=9E=E6=8C=87=E5=AE=9A=E4= =B9=8B=E6=94=B6=E4=BB=B6=E8=80=85,=E8=AB=8B=E5=8B=BF=E8=92=90=E9=9B=86=E3= =80=81=E8=99=95=E7=90=86=E6=88=96=E5=88=A9=E7=94=A8=E6=9C=AC=E4=BF=A1=E4=BB= =B6=E5=85=A7=E5=AE=B9,=E4=B8=A6=E8=AB=8B=E9=8A=B7=E6=AF=80=E6=AD=A4=E4=BF= =A1=E4=BB=B6. > =E5=A6=82=E7=82=BA=E6=8C=87=E5=AE=9A=E6=94=B6=E4=BB=B6=E8=80=85,=E6=87=89= =E7=A2=BA=E5=AF=A6=E4=BF=9D=E8=AD=B7=E9=83=B5=E4=BB=B6=E4=B8=AD=E6=9C=AC=E5= =85=AC=E5=8F=B8=E4=B9=8B=E7=87=9F=E6=A5=AD=E6=A9=9F=E5=AF=86=E5=8F=8A=E5=80= =8B=E4=BA=BA=E8=B3=87=E6=96=99,=E4=B8=8D=E5=BE=97=E4=BB=BB=E6=84=8F=E5=82= =B3=E4=BD=88=E6=88=96=E6=8F=AD=E9=9C=B2,=E4=B8=A6=E6=87=89=E8=87=AA=E8=A1= =8C=E7=A2=BA=E8=AA=8D=E6=9C=AC=E9=83=B5=E4=BB=B6=E4=B9=8B=E9=99=84=E6=AA=94= =E8=88=87=E8=B6=85=E9=80=A3=E7=B5=90=E4=B9=8B=E5=AE=89=E5=85=A8=E6=80=A7,= =E4=BB=A5=E5=85=B1=E5=90=8C=E5=96=84=E7=9B=A1=E8=B3=87=E8=A8=8A=E5=AE=89=E5= =85=A8=E8=88=87=E5=80=8B=E8=B3=87=E4=BF=9D=E8=AD=B7=E8=B2=AC=E4=BB=BB. > Please be advised that this email message (including any attachments) > contains confidential information and may be legally privileged. If you a= re > not the intended recipient, please destroy this message and all attachmen= ts > from your system and do not further collect, process, or use them. Chungh= wa > Telecom and all its subsidiaries and associated companies shall not be > liable for the improper or incomplete transmission of the information > contained in this email nor for any delay in its receipt or damage to you= r > system. If you are the intended recipient, please protect the confidentia= l > and/or personal information contained in this email with due care. Any > unauthorized use, disclosure or distribution of this message in whole or = in > part is strictly prohibited. Also, please self-inspect attachments and > hyperlinks contained in this email to ensure the information security and > to protect personal information.* > --001a1135fe4ee5b242051d141aa4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello

The UnpackContent is for dealing = with archive formats (tar, zip, etc..).

If your fi= le is a compression format (as is the case with the part-0002.gz file) then= you first need to run it through 'CompressContent' in 'decompr= ess' mode.=C2=A0 You can even first run it through 'IdentifyMimeTyp= e' and set up a flow to handle arbitrarily complicated layers of compre= ssion/archive structures. =C2=A0

So for this case:=

- GetHDFS (or ListHDFS and FetchHDFS)
-= CompressContent (in decompress mode)

Now you have= your text oriented file ready to be dealt with.=C2=A0 If you perhaps want = to deal with each line individually you can use=C2=A0
- SplitText= (line split count of 1)

Thanks
Joe

On Tue, Au= g 11, 2015 at 9:27 PM, =E5=BD=AD=E5=85=89=E8=A3=95 <rolandpeng@cht.com= .tw> wrote:

hi,

=C2=A0=C2=A0=C2=A0=C2=A0 I have a compres= sed file got from GetHDFS processor and to be unpacked by using UnpackConte= nt processor, I have already set the UnpackContent processor property packa= ging format to 'tar', but an error like below always takes place.

=C2=A0=C2=A0=C2=A0=C2=A0

The error logs is attached below (Unable = to unpack StandardFlowFileRecord)

=C2=A0

2015-08-11 07:10:52,291 ERROR [Timer-Driv= en Process Thread-4] o.a.n.processors.standard.UnpackContent UnpackContent[= id=3Db90c65e1-b97f-3b4b-9e37-6223afa1ef03] Unable to unpack StandardFlowFileRecord[uuid=3D85b7d53b-3183-4c48-9160-b2e714b5eaa8,claim= =3D1439248247840-1,offset=3D0,name=3Dpart-00002.gz,size=3D59212170] due to = org.apache.nifi.processor.exception.ProcessException: IOException thrown fr= om UnpackContent[id=3Db90c65e1-b97f-3b4b-9e37-6223afa1ef03]: java.io.IOException: Error detected parsing the header; routing to failure= : org.apache.nifi.processor.exception.ProcessException: IOException thrown = from UnpackContent[id=3Db90c65e1-b97f-3b4b-9e37-6223afa1ef03]: java.io.IOEx= ception: Error detected parsing the header

=C2=A0

=C2=A0=C2=A0=C2=A0 My compressed file is = named part-00002.gz, and you can access the file here: https://dl.dropboxusercontent.com/u/24808937/part-00002.gz

=C2=A0=C2=A0=C2=A0=C2=A0 Any advice would= be welcome. Please help how to solve this problem, thank you!

=C2=A0

Roland



=E6=9C=AC=E4=BF=A1=E4=BB=B6=E5=8F=AF=E8=83=BD= =E5=8C=85=E5=90=AB=E4=B8=AD=E8=8F=AF=E9=9B=BB=E4=BF=A1=E8=82=A1=E4=BB=BD=E6= =9C=89=E9=99=90=E5=85=AC=E5=8F=B8=E6=A9=9F=E5=AF=86=E8=B3=87=E8=A8=8A,=E9= =9D=9E=E6=8C=87=E5=AE=9A=E4=B9=8B=E6=94=B6=E4=BB=B6=E8=80=85,=E8=AB=8B=E5= =8B=BF=E8=92=90=E9=9B=86=E3=80=81=E8=99=95=E7=90=86=E6=88=96=E5=88=A9=E7=94= =A8=E6=9C=AC=E4=BF=A1=E4=BB=B6=E5=85=A7=E5=AE=B9,=E4=B8=A6=E8=AB=8B=E9=8A= =B7=E6=AF=80=E6=AD=A4=E4=BF=A1=E4=BB=B6. =E5=A6=82=E7=82=BA=E6=8C=87=E5=AE=9A=E6=94=B6=E4=BB=B6=E8=80=85,=E6=87=89= =E7=A2=BA=E5=AF=A6=E4=BF=9D=E8=AD=B7=E9=83=B5=E4=BB=B6=E4=B8=AD=E6=9C=AC=E5= =85=AC=E5=8F=B8=E4=B9=8B=E7=87=9F=E6=A5=AD=E6=A9=9F=E5=AF=86=E5=8F=8A=E5=80= =8B=E4=BA=BA=E8=B3=87=E6=96=99,=E4=B8=8D=E5=BE=97=E4=BB=BB=E6=84=8F=E5=82= =B3=E4=BD=88=E6=88=96=E6=8F=AD=E9=9C=B2,=E4=B8=A6=E6=87=89=E8=87=AA=E8=A1= =8C=E7=A2=BA=E8=AA=8D=E6=9C=AC=E9=83=B5=E4=BB=B6=E4=B9=8B=E9=99=84=E6=AA=94= =E8=88=87=E8=B6=85=E9=80=A3=E7=B5=90=E4=B9=8B=E5=AE=89=E5=85=A8=E6=80=A7,= =E4=BB=A5=E5=85=B1=E5=90=8C=E5=96=84=E7=9B=A1=E8=B3=87=E8=A8=8A=E5=AE=89=E5= =85=A8=E8=88=87=E5=80=8B=E8=B3=87=E4=BF=9D=E8=AD=B7=E8=B2=AC=E4=BB=BB.=20
Please be advised that this email message (including any attachments) c= ontains confidential information and may be legally privileged. If you are = not the intended recipient, please destroy this message and all attachments= from your system and do not further collect, process, or use them. Chunghw= a Telecom and all its subsidiaries and associated companies shall not be li= able for the improper or incomplete transmission of the information contain= ed in this email nor for any delay in its receipt or damage to your system.= If you are the intended recipient, please protect the confidential and/or = personal information contained in this email with due care. Any unauthorize= d use, disclosure or distribution of this message in whole or in part is st= rictly prohibited. Also, please self-inspect attachments and hyperlinks co= ntained in this email to ensure the information security and to protect per= sonal information.

--001a1135fe4ee5b242051d141aa4-- --001a1135fe4ee5b247051d141aa5 Content-Type: image/gif; name="image001.gif" Content-Disposition: inline; filename="image001.gif" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: 1e7f8b17f6222c44_0.1 R0lGODlhJwAmANUAAP///////v///f7+/v7+/f7+/P7++/7++v39/P39+/39+v39+f39+Pz8+fz8 +Pz89/z89vv7+fv7+Pv79/v79vv79fr69/r69vr69fr69Pr68/n59fn59Pn58/n58vj48/j48vj4 8ff38vf38fb27QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACwAAAAAJwAmAAAG/0CQcEgsGo/IYWLJ XCoUTic0MX1GqdWlwZAAeL/gsHhM/k4Uj4LDoTin1220mu2ex990BUVdcVQef35+gA+Cf4F9h4WJ hH4TDgUEBAUDAFsABZSWBpial5mVn54GAZwKCxZxF3oSEmerFK2vrK4KsLIKERMNEg4kv8DBwsPE xcAgHiAUHhwdHCAhy83P0czO0NLX1c0YIcpeXQWYAF0B4uLl5+QA5uPpmArrE/P09fb3+Pn0H94h /v3Q/kEL6G/gP4AHCwqZ8IEChQoMGTqE+EHiw4gNL1bMCLEChYaPLjigMAHCBAwiSZpEObLkyZQu WaoM6YDKgABPCCgIQACnAv+dPH0C7ZlzJ9GfT3AmECDAQc8CCTIlIBA1KlSqUKVitTo1agAECAIk GUs2SYZ5GS48uJDhQVsKGSg8wIDhAVy5dO3GnVv3Lt8HZQILLuOgQYM1hwsnXrwYcWPFiB07GMGB w4gOICx36EAZWgYhGbxp5kztM4jQmS9zfsK6tevXsGO3nnChNgW1tFNecFh7JO7et3nrfpB7AojN FEZQELJZyPLjmJUxj/68uXToH6AaUMCF+/YtVAR8777Fe/jxVMobKMS+vYMH8B1saO8efqH57+Pr 9zawPxGEQvD3n0EBDhECahlwVoFqGngwwoIYiJDggww6CKGECqqWQTxdLACvgIdViQNihx+Sk1WJ S5QIIlQXdNBii3Rx0GIGdGEgYwc12jhjjTfmeOMEDgUp5JBEFmmkQwo0gEEDQD6iJJMkFbZkk1JC 6eSUUTYg0i4YrEEBYl2OBKaXY4opJZlnOvAFJwAQsKYXbnrBZpyWwPlmm14gAIACBZyRwAITLJDA Kn8GOigUgApKaKKHFroAAgdYUNaklBpmS5IMsAFHA5kqsGmnn2rKBqeaNjDYqagGAQA7 --001a1135fe4ee5b247051d141aa5--