hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Unable to contact Regionserver in spite of META entry...
Date Fri, 30 Jul 2010 00:28:32 GMT
Hey Vidhya,

What version are you on, again? If you're on 0.89, the "hbase hbck" utility
might be of use here.

Any logs in that server that pertain to the given region name? Any
exceptions there? What if you run the shell with
HBASE_ROOT_LOGGER=DEBUG,console set so that you see the debug output as it
retries?

-Todd

On Thu, Jul 29, 2010 at 12:31 PM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> I have an MR job that sends streams of updates (puts and deletes) to an
> existing db and all the tasks are crashing complaining of the exceptions
> similar to the following:
>
>
>
>   Exception in thread "main"
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server, retryOnlyOne=true, index=0, islastrow=false,
> tries=9, numtries=10, i=78, listsize=390,
> region=DocData,0000001013071992,1279835733117 for region
> DocData,0000001013071992,1279835733117, row '0000001013115520', but failed
> after 10 attempts.
>
>
>
> I ran this job on 180 nodes with a max of 6 tasks per node; I thought this
> was possibly due to overload so I ran it with just 2 tasks per node but
> again got similar exceptions..
>
> Then I tried issuing a put on the hbase shell: And it complained of the
> same issue..
>
> I checked the meta table entry and it seems fine.. I checked the
> corresponding region server (web ui) and it is indeed hosting the region.
>
>
>
> DocData,0000001013071992,12 column=info:regioninfo,
> timestamp=1280305164242, value=REGION => {NAME => 'DocDat
>  79835733117                 a,0000001013071992,1279835733117', STARTKEY =>
> '0000001013071992', ENDKEY => '000
>                             0001013205991', ENCODED => 1962005300, TABLE =>
> {{NAME => 'DocData', MAX_FILESIZE
>                              => '4402341480', FAMILIES => [{NAME =>
> 'bigColumn', VERSIONS => '1', COMPRESSION
>                              => 'NONE', TTL => '2147483647', BLOCKSIZE =>
> '1048576', IN_MEMORY => 'false', BL
>                             OCKCACHE => 'false'}]}}
>  DocData,0000001013071992,12 column=info:server, timestamp=1280317959911,
> value=63.250.207.87:60020
>  79835733117
>  DocData,0000001013071992,12 column=info:serverstartcode,
> timestamp=1280317959911, value=1279926520261
>  79835733117
>
>
> Can you see what is wrong here?
>
> Thank you
> Vidhya
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message