From user-return-940-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Jan 26 07:45:35 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 42540180633 for ; Tue, 26 Jan 2021 08:45:35 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 8998C62C67 for ; Tue, 26 Jan 2021 07:45:34 +0000 (UTC) Received: (qmail 96359 invoked by uid 500); 26 Jan 2021 07:45:33 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 96349 invoked by uid 99); 26 Jan 2021 07:45:33 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Jan 2021 07:45:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id A52591FF39B for ; Tue, 26 Jan 2021 07:45:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id MYrZhpPcPhWx for ; Tue, 26 Jan 2021 07:45:32 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.219.51; helo=mail-qv1-f51.google.com; envelope-from=krasovcheg@gmail.com; receiver= Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id AB9DEBC957 for ; Tue, 26 Jan 2021 07:45:31 +0000 (UTC) Received: by mail-qv1-f51.google.com with SMTP id w11so1496960qvz.12 for ; Mon, 25 Jan 2021 23:45:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=fAoZaEk2hgr9uawUnsSIFXDroenNbTtD0/P6QHX/fow=; b=GpZCAmagXqVBh5uXzAounyNZGxGpgy3o9YSiuRL1+Pp6c/xKas/w62F7D7SGOjoc8P ZjCCD5P4Cz4PuTCB0PUj21tiTo3YO0YIoFl13M4yEEojclPARZcMGACVgN0z9mg5g2A+ 0qOAKOGuwDVeeb1s4EAVcKz/iplGEoy+Il4ZyoAsK4YGZRWEifzbgMoDfzSEs1PBcXYS bKuFuYzYoGvWCj/n/iwhea9vjfNBAb43TMGPhmCCURNjBGZ3YdhtENPwMF/tjZiHcJVj +OhUZrBySjMk0Ug3DvOq4w4zqZ2HF0iA1eeLKr2efLWdNv7I9POO5j7cVKtDmbml2NQX Jpfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=fAoZaEk2hgr9uawUnsSIFXDroenNbTtD0/P6QHX/fow=; b=bE+9QJQC54kQKZLrZrTZhWzQ3uyxxtPMBdGQEDM+AA4vLVs+VP83wOfwvGVfjGFn6Q bdpturqOrvMo9PBM9mNsAKpQYLEZxuC35Cek0pe/MpwhSeXQsKD7evp2f+CerEsSoVPH 9Z98m7rnAdBEP1GIgBHJsgG9mu62zJoWdkOC/SO6rLuq+oWQDN5LUOaolPJXFFQp2ZPO 2BGgrTW0xishOE/XIGg7nZILwbv9Ne8UD2a5IDQSMobFWYaYvtiYubcpOwcJfcRNKUpA qNgY9ixUsJH4PxqPwAkLbp48La/n2p/ZpcVk2aWKDqTjSrNdGY+JpXTe8XrNOS6a7Ai4 Ohag== X-Gm-Message-State: AOAM532X//MWQoeaDPLVi8LEZDaYU4Yj0DliHDadmFUEL5eKKGORB9y/ +rGQnUTZGj629lSFgZmB/5V7/HlI7DrJ/iUDQTqPb2Bqy0VOMw== X-Google-Smtp-Source: ABdhPJwN3tZTzR/C5rSA6tdrxtuWwOJKSToeWSZikI79hkYwu6YqQP0cWgJ3oHTpnS3Rhno+WA+3cNnaLN6QXBsyX40= X-Received: by 2002:a05:6214:10e7:: with SMTP id q7mr4410798qvt.28.1611647125247; Mon, 25 Jan 2021 23:45:25 -0800 (PST) MIME-Version: 1.0 From: =?UTF-8?B?0KHQtdGA0LPQtdC5INCa0YDQsNGB0L7QstGB0LrQuNC5?= Date: Tue, 26 Jan 2021 08:45:14 +0100 Message-ID: Subject: [Python] HDFS write fails when size of file is higher than 6gb To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000675d5005b9c8d618" --000000000000675d5005b9c8d618 Content-Type: text/plain; charset="UTF-8" Hello Arrow team, I have an issue with writing files with size > 6143mb to HDFS. Exception is: Traceback (most recent call last): > File "exp.py", line 22, in > output_stream.write(open(source, "rb").read()) > File "pyarrow/io.pxi", line 283, in pyarrow.lib.NativeFile.write > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > OSError: HDFS Write failed, errno: 22 (Invalid argument) > The code below works for files with size <= 6143mb. Hadoop version: 3.1.1.3.1.4.0-315 Python version: 3.6.10 Pyarrow version: 2.0.0 System: Ubuntu 16.04.7 LTS I try to understand what happens under the hood of pyarrow.lib.NativeFile.write. Is there any limitation from pyarrow side, incompatibility with hadoop version or some settings issue on my side. If you have any input I would highly appreciate it. The python script to upload a file: import os > import pyarrow as pa > > os.environ["JAVA_HOME"]="" > os.environ['ARROW_LIBHDFS_DIR'] = "/libhdfs.so" > > connected = pa.hdfs.connect(host="",port=8020) > > destination = "hdfs://:8020/user/tmp/6144m.txt" > source = "/tmp/6144m.txt" > > with connected.open(destination, "wb") as output_stream: > output_stream.write(open(source, "rb").read()) > > connected.close() > How to create a 6gb file: truncate -s 6144M 6144m.txt > Thanks a lot, Sergey --000000000000675d5005b9c8d618 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello Arrow team,

I have an issue with = writing files with size > 6143mb to HDFS. Exception is:

Traceback (most recen= t call last):
=C2=A0 File "exp.py", line 22, in <module>=
=C2=A0 =C2=A0 output_stream.write(open(source, "rb").read())<= br>=C2=A0 File "pyarrow/io.pxi", line 283, in pyarrow.lib.NativeF= ile.write
=C2=A0 File "pyarrow/error.pxi", line 99, in pyarrow= .lib.check_status
OSError: HDFS Write failed, errno: 22 (Invalid argumen= t)

The code below works for files with = size <=3D=C2=A06143mb.

Hadoop version:=C2=A03.1= .1.3.1.4.0-315
Python version:=C2=A03.6.10
Pyarrow vers= ion:=C2=A02.0.0
System:=C2=A0Ubuntu 16.04.7 LTS

I try to understand what happens under the hood of=C2=A0pyarrow.lib= .NativeFile.write. Is there any limitation from pyarrow side,=C2=A0incompat= ibility with hadoop version or some settings issue on my side.=C2=A0
<= div>
If you have any input I would highly appreciate=C2=A0it.=

The python script to upload a file:

import os
import pyar= row as pa

os.environ["JAVA_HOME"]=3D"<java_home>= ;"
os.environ['ARROW_LIBHDFS_DIR'] =3D "<path>/l= ibhdfs.so"

connected =3D pa.hdfs.connect(host=3D"<host&= gt;",port=3D8020)

destination =3D "hdfs://<host>:802= 0/user/tmp/6144m.txt"
source =3D "/tmp/6144m.txt"

= with connected.open(destination, "wb") as output_stream:
=C2= =A0 =C2=A0 output_stream.write(open(source, "rb").read())

= connected.close()

How to create a 6gb f= ile:

truncate -s 6144M 6144m.txt

Thank= s a lot,
Sergey=C2=A0
--000000000000675d5005b9c8d618--