hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@archive.org>
Subject Re: s3
Date Sat, 06 Jan 2007 02:52:35 GMT
Tom White wrote:
> I've raised a Jira: http://issues.apache.org/jira/browse/HADOOP-857.
> I'll take a look at it.
>
> Tom
Thanks (Your supposition in 857 looks right).

Other things I notice are that the '-rmr'/'-lsr' options don't act as 
'expected'.  Its a little confusing.  Should the 'hadoop fs' tool return 
from rmr/lsr rather be that the action is not supported rather than 
'success' if, say, I try to remove a 'directory' from S3?  (See below 
for illustrative output).

Any what do people think of the following.  We already have a bunch of 
stuff up in S3 that we'd like to use as input to a hadoop mapreduce job 
only it wasn't put there by hadoop so it doesn't have the hadoop format 
where file-is-actually-a-list-of-blocks.   What if files put there by 
the S3 filesystem started w/ some 'magic' and if not present, assume 
file is not a list of 'blocks' but actual file content?  It means you'd 
loose some facility (E.g: rename might not be possible, or 
expensive,etc.)  Also, for copying into S3, it would be sweet if could 
set a flag that said don't make files be a list of INODES but be actual 
files when they land on S3 (I'm thinking of a big CopyFiles mapreduce 
job that would pull loads of local content with http and deposit it all 
into S3 untampered so it could be served from the likes of apache w/o 
having to go via hadoop).

Good stuff,
St.Ack



Here I'm listing a BUCKET directory that was copied up using 'hadoop 
fs', then rmr'ing it and then listing again:

stack@bregeon:~/checkouts/hadoop$  ./bin/hadoop fs -fs 
s3://ID:SECRET@BUCKET -ls /fromfile
Found 2 items
/fromfile/diff.txt      <r 1>   591
/fromfile/x.js  <r 1>   2477
stack@bregeon:~/checkouts/hadoop$  ./bin/hadoop fs -fs 
s3://ID:SECRET@BUCKET -rmr /fromfile
Deleted /fromfile
stack@bregeon:~/checkouts/hadoop$  ./bin/hadoop fs -fs 
s3://ID:SECRET@BUCKET -ls /fromfile
Found 0 items

The '0 items' is odd because, now, listing my BUCKET using a tool other 
than 'hadoop fs' (i.e. hanzo webs python scripts):

stack@bregeon:~/checkouts/hadoop.trunk$ s3ls BUCKET
%2F
%2Ffromfile%2F.diff.txt.crc
%2Ffromfile%2F.x.js.crc
%2Ffromfile%2Fdiff.txt
%2Ffromfile%2Fx.js
block_-5013142890590722396
block_5832002498000415319
block_6889488315428893905
block_9120115089645350905

Its all still there still.  I can subsequently do the likes of the 
following:

stack@bregeon:~/checkouts/hadoop$  ./bin/hadoop fs -fs 
s3://ID:SECRET@BUCKET -rmr /fromfile/diff.txt

... and the delete will succeed and looking at the bucket with alternate 
tools shows that it has actually been remove, and so on up the hierarchy.

Mime
View raw message