hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Cockrill <jamie.cockr...@gmail.com>
Subject Re: Regionserver tanked, can't seem to get master back up fully
Date Tue, 03 Aug 2010 13:22:40 GMT
Hi JD,

The cluster is on a separated network, I'll see if any of the traces
remain. As for the ulimit and xceivers bit, those are setup correctly
as per the API doc you mention.

Thanks

Jamie

On 2 August 2010 19:18, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Is that coming from the master? If so, it means that it was trying to
> write recovered data from a failed region server and wasn't able to do
> so. It sounds bad.
>
> - Can we get full stack traces of that error?
> - Did you check the datanode logs for any exception? Very often
> (strong emphasis on "very"), it's an issue with either ulimit or
> xcievers. Is your cluster configured per the last bullet on that page?
> http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements
>
> Thx
>
> J-D
>
> On Mon, Aug 2, 2010 at 6:16 AM, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
>> Hi All,
>>
>> I set off a long-running loading job over the weekend and it seems to
>> have rather destroyed my hbase cluster. Most of the nodes were down
>> this morning and upon restarting them, I'm now persistently getting
>> the following message every few ms in the master logs:
>>
>> DfsClient: Could not complete file
>> /hbase/.logs/compute17.cluster1.lan,60020,1280518716613/a filename
>>
>> That file is a zero-byte file on the HDFS. The data-nodes all look
>> fine and don't seem to have had any trouble. I'm not especially fussed
>> about having to rebuild that table and reload it, but the trouble is
>> now that I can't start the cluster properly so I can drop the table.
>>
>> Does anyone know how I can remove the table/fix these errors manually.
>> As I said, I'm not fussed about data-loss.
>>
>> thanks
>>
>> Jamie
>>
>

Mime
View raw message