hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@yahoo-inc.com>
Subject RE: ipc.client.timeout
Date Thu, 13 Sep 2007 21:51:06 GMT
Hi Joydeep,

Thanks for your comments. Really appreciate it.

For the Namenode configuration, please see if you can use most of the memory
available on the machine. Maybe a param of -xmx7000 or so shud do it. Also,
you might want to bump up the number of Namenode handler threads,
dfs.namenode.handler.count. By default this is set to 10. It might make
sense to set this to 40 or so.

Thanks,
dhruba

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
Sent: Thursday, September 13, 2007 2:45 PM
To: hadoop-user@lucene.apache.org
Subject: RE: ipc.client.timeout

- fixed namenode to not be data/task node
- 31K files right now
- haven't played around with memory options - namenode still running
with xmx1000m - I can bump this up (8G memory available)

Btw - from what I see in code - the server is likely discarding the
client call (and not performing the operation at all). Another (dumber)
approach for handling the idempotency issue would be for the client to
retry anyway - in most cases, the server would not have performed the
operation. In the minority of the cases where the server already
performed the operation - the client can report a timeout error (instead
of the actual error). (ie. It's almost as if the last retry was not
performed). (there could be some flaw in this logic - just can't think
of one right now)

-----Original Message-----
From: Dhruba Borthakur [mailto:dhruba@yahoo-inc.com] 
Sent: Thursday, September 13, 2007 2:21 PM
To: hadoop-user@lucene.apache.org
Subject: RE: ipc.client.timeout

We have discussed the approach of remembering completed RPCs (and there
status codes, return parameters, etc) so that a retry of a previously
executed RPC can get back identical results. But we have not implemented
this yet.

In the short term, it would be nice if you can make the Namenode run on
a
dedicated machine (no Datanodes, tasktrackers, etc on this machine).
Also,
how many files does ur cluster have and how much is the main memory on
the
Namenode machine? How much memory is the Namenode jvm configured to use?

Thanks,
dhruba


-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
Sent: Thursday, September 13, 2007 2:16 PM
To: hadoop-user@lucene.apache.org
Subject: RE: ipc.client.timeout

Learning the hard way :-)

Second Ted's last mail (all the way back to Sun RPC - server can keep
track of completed RPC calls and reply success to client retries if op
already performed). 

-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Thursday, September 13, 2007 1:54 PM
To: hadoop-user@lucene.apache.org
Subject: Re: ipc.client.timeout

Joydeep Sen Sarma wrote:
> Quite likely it's because the namenode is also a data/task node. 

That doesn't sound like a "best practice"...

Doug



Mime
View raw message