hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Grice <ngr...@gmail.com>
Subject io.file.buffer.size different when not running in proper bash shell?
Date Sat, 24 Aug 2013 00:56:33 GMT
Thanks in advance for any help. I have been banging my head against the
wall on this one all day.
When I run the cmd:
hadoop fs -put /path/to/input /path/in/hdfs from the command line, the
hadoop shell dutifully copies my entire file correctly, no matter the size.

I wrote a webservice client for an external service in python and I am
simply trying to replicate the same command after retreiving some csv
delimited results from the webservice

cmd = ['hadoop', 'fs', '-put', '/path/to/input/', '/path/in/hdfs/']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
output, errors = p.communicate()
if p.returncode:
   raise OSError(errors)
  LOG.info( output )

without fail the hadoop shell only writes the first 4096 bytes of the input
file (which according to the documentation is the default value for

I have tried almost everything including adding
-Dio.file.buffer.size=XXXXXX where XXXXXX is a really big number and
NOTHING seems to work.

Please help!

View raw message