hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@archive.org>
Subject Re: DFS: Cannot obtain additional block for file....
Date Fri, 24 Mar 2006 04:44:22 GMT
Andrzej Bialecki wrote:
> ...
> From reading the code, this could happen if previous blocks from this 
> file haven't been written out yet (i.e. their replication is less than 
> 1). Probably we should wait here a bit longer.. or perhaps datanodes 
> should report block completion sooner.
>
Thanks Andrzej:

But looking at code, we do not seem to be getting as far as the 
replication checks.  We are not passing the 'if' test -- 'if 
(dir.getFile(src) == null && pendingCreates.get(src) != null) {' -- on 
line 294 in FSNamesystem (Revision: 388306) else the return from this 
method would be non-null (The IOE thrown on line 160 of NN is because 
'results' are null).   Either the file exists in the file system -- 
probably unlikely at this stage -- or it has not yet been created so no 
entry for it in pendingCreates -- which itself is odd.

In namenode log just before the below 'Cannot obtain additional block 
from file' exception, I'll first see a message like below:

060323 164808 Removing lease [Lease.  Holder: DFSClient_1789644890, 
heldlocks: 0, pendingcreates: 0], leases remaining: 103

I ain't sure how these leases are working but there seems to be a 
correlation.
Thanks,
St.Ack


Michael Stack wrote:
> Why would a lightly loaded nameserver w/ no other emissions on a 
> seemingly healthy machine have trouble allocating blocks in a job that 
> is almost done?
>
> From the nameserver log:
>
> 060323 173126 Server handler 4 on 8009 call error: 
> java.io.IOException: Cannot obtain additional block for file 
> /user/stack/e04/outputs/segments/20060322213322/crawl_parse/part-00019
> java.io.IOException: Cannot obtain additional block for file 
> /user/stack/e04/outputs/segments/20060322213322/crawl_parse/part-00019
>    at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:160)
>    at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>
>    at java.lang.reflect.Method.invoke(Method.java:585)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216)
>
> It looks like we try 5 times at 100ms interval and still come back 
> empty handed. Its recurrence is threatening my jobs' completion.

 From reading the code, this could happen if previous blocks from this 
file haven't been written out yet (i.e. their replication is less than 
1). Probably we should wait here a bit longer.. or perhaps datanodes 
should report block completion sooner.

-- 
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message