hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmund Day <edmund...@yahoo.com>
Subject fpcalc on Hadoop streaming can't find file
Date Mon, 25 Aug 2014 11:26:16 GMT
I have an hdfs directory that contains audio files. I wish to run fpcalc on each file using
Hadoop streaming. I can do this locally no problem, but in hadoop fpcalc cannot see the files.
My code is:

    import shlex
    cli = './fpcalc -raw -length ' + str(sample_length) + ' ' + file_a
    from subprocess import Popen, PIPE
  
    cli_parts = shlex.split(cli)
    fpcalc_cli = Popen(cli_parts, stdin=PIPE, stderr=PIPE, stdout=PIPE)
    fpcalc_out,fpcalc_err=fpcalc_cli.communicate()
cli_parts is: ['./fpcalc', '-raw', '-length', '30', '/user/hduser/audio/input/flacOriginal1.flac']
and runs fine locally.

fpcalc_err is:

    ERROR: couldn't open the file
    ERROR: unable to calculate fingerprint for file /user/hduser/audio/input/flacOriginal1.flac,
skipping

the file DOES exist:

    hadoop fs -ls /user/hduser/audio/input/flacOriginal1.flac
    Found 1 items
    -rw-r--r--   1 hduser supergroup    2710019 2014-08-08 11:49 /user/hduser/audio/input/flacOriginal1.flac


Can I point to a file like this in Hadoop streaming?

TIA!!!! 




Read how Aylesbury and the Earth were created, here:
http://edday.co.uk
Mime
View raw message