hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin O'dell" <kevin.od...@cloudera.com>
Subject Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR
Date Thu, 01 Nov 2012 19:55:17 GMT
Ameya,

 If your new table goes well(did you presplit this time?), then what we can
do for the old one:

rm /hbase/tablename
hbck -fixMeta -fixAssignments
restart HBase if it is still present
All should be well.

Please let us know how it goes.

On Thu, Nov 1, 2012 at 2:44 PM, Ameya Kantikar <ameya@groupon.com> wrote:

> Thanks Kevin & Ram. Please find my answers below:
>
> Did you presplit your table? - NO
>
> You are on .92, might as well take advantage of HFilev2 and use 10GB region
> sizes -
>
>  - I have put my region size now at 10GB and running another load in a
> separate table, but my existing table is still in bad shape.
>
> Loading over MR, I am assuming puts?
> -Yes
>
> Did you tune your memstore and Hlog
> size?
> -Not yet. I am running with whatever are the defaults.
>
> You aren't using a different client version or something strange like that
> are you? - Nope. Its the same jar everywhere.
>
> You can't close hlog messages seem to indicate an inability to talk to
> HDFS.  Did you have connection issues there?
> - I did find log on 1 data node with some HDFS issue. But that was only 1
> data node. All other data node looked good.
> Note, I also ran another big distcp job on the same cluster and did not
> find any issues.
>
> I also restarted the cluster (all nodes, including hadoop), hbase hbck is
> not showing inconsistencies, but my table is still neither enabled nor
> disabled.
> I ran MR job to load data, but it continued to throw same earlier errors.
>
> Now I am running separate job loading data into brand new table, with max
> region size at 10 GB. I'll get back to you with results on that one. But
> existing table is still not reachable.
>
> Thanks for your help.
>
> Ameya
>
>
>
>
>
> On Thu, Nov 1, 2012 at 6:35 AM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
>
> > Couple thoughts(it is still early here so bear with me):
> >
> > Did you presplit your table?
> >
> > You are on .92, might as well take advantage of HFilev2 and use 10GB
> region
> > sizes
> >
> > Loading over MR, I am assuming puts?  Did you tune your memstore and Hlog
> > size?
> >
> > You aren't using a different client version or something strange like
> that
> > are you?
> >
> > You can't close hlog messages seem to indicate an inability to talk to
> > HDFS.  Did you have connection issues there?
> >
> >
> >
> > On Thu, Nov 1, 2012 at 5:20 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > Can you try restarting the cluster i mean the master and RS.
> > > Also if this things persists try to clear the zk data and restart.
> > >
> > > Regards
> > > Ram
> > >
> > > On Thu, Nov 1, 2012 at 2:46 PM, Cheng Su <scarcer.cn@gmail.com> wrote:
> > >
> > > > Sorry, my mistake. Ignore about the "max store size of a single CF"
> > > please.
> > > >
> > > > m(_ _)m
> > > >
> > > > On Thu, Nov 1, 2012 at 4:43 PM, Ameya Kantikar <ameya@groupon.com>
> > > wrote:
> > > > > Thanks Cheng. I'll try increasing my max region size limit.
> > > > >
> > > > > However I am not clear with this math:
> > > > >
> > > > > "Since you set the max file size to 2G, you can only store 2XN G
> data
> > > > > into a single CF."
> > > > >
> > > > > Why is that? My assumption is, even though single region can only
> be
> > 2
> > > > GB,
> > > > > I can still have hundreds of regions, and hence can store 200GB+
> data
> > > in
> > > > > single CF on my 10 machine cluster.
> > > > >
> > > > > Ameya
> > > > >
> > > > >
> > > > > On Thu, Nov 1, 2012 at 1:19 AM, Cheng Su <scarcer.cn@gmail.com>
> > wrote:
> > > > >
> > > > >> I met same problem these days.
> > > > >> I'm not very sure the error log is exactly same, but I do have
the
> > > > >> same exception
> > > > >>
> > > > >>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > >> Failed 1 action: NotServingRegionException: 1 time, servers with
> > > > >> issues: smartdeals-hbase8-snc1.snc1:60020,
> > > > >>
> > > > >> and the table is also neither enabled nor disabled, thus I can't
> > drop
> > > > it.
> > > > >>
> > > > >> I guess the problem is the total store size.
> > > > >> How many region server do you have?
> > > > >> Since you set the max file size to 2G, you can only store 2XN
G
> data
> > > > >> into a single CF.
> > > > >> (N is the number of your region servers)
> > > > >>
> > > > >> You might want to increase the max file size or region servers.
> > > > >>
> > > > >> On Thu, Nov 1, 2012 at 3:29 PM, Ameya Kantikar <ameya@groupon.com
> >
> > > > wrote:
> > > > >> > One more thing, the Hbase table in question is neither enabled,
> > nor
> > > > >> > disabled:
> > > > >> >
> > > > >> > hbase(main):006:0> is_disabled 'userTable1'
> > > > >> > false
> > > > >> >
> > > > >> > 0 row(s) in 0.0040 seconds
> > > > >> >
> > > > >> > hbase(main):007:0> is_enabled 'userTable1'
> > > > >> > false
> > > > >> >
> > > > >> > 0 row(s) in 0.0040 seconds
> > > > >> >
> > > > >> > Ameya
> > > > >> >
> > > > >> > On Thu, Nov 1, 2012 at 12:02 AM, Ameya Kantikar <
> > ameya@groupon.com>
> > > > >> wrote:
> > > > >> >
> > > > >> >> Hi,
> > > > >> >>
> > > > >> >> I am trying to load lot of data (around 1.5 TB) into
a single
> > Hbase
> > > > >> table.
> > > > >> >> I have setup region size at 2 GB. I also
> > > > >> >> set hbase.regionserver.handler.count at 30.
> > > > >> >>
> > > > >> >> When I start loading data via MR, after a while, tasks
start
> > > failing
> > > > >> with
> > > > >> >> following error:
> > > > >> >>
> > > > >> >>
> > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > >> Failed 1 action: NotServingRegionException: 1 time, servers with
> > > issues:
> > > > >> smartdeals-hbase8-snc1.snc1:60020,
> > > > >> >>       at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641)
> > > > >> >>       at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> > > > >> >>       at
> > > > >>
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
> > > > >> >>       at
> > > org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
> > > > >> >>       at
> > org.apache.hadoop.hbase.client.HTable.put(HTable.java:795)
> > > > >> >>       at
> > > > >>
> > > >
> > >
> >
> com..mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:83)
> > > > >> >>       at
> > > > >>
> > > >
> > >
> >
> com..mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:33)
> > > > >> >>       at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
> > > > >> >>       at
> > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
> > > > >> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.j
> > > > >> >>
> > > > >> >> On the hbase8 machine I see following in logs:
> > > > >> >>
> > > > >> >> ERROR org.apache.hadoop.hbase.regionserver.wal.HLog:
Error
> while
> > > > >> syncing, requesting close of hlog
> > > > >> >> java.io.IOException: Reflection
> > > > >> >>         at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
> > > > >> >>         at
> > > > >>
> org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1109)
> > > > >> >>         at
> > > > >> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1213)
> > > > >> >>         at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1071)
> > > > >> >>         at java.lang.Thread.run(Thread.java:662)
> > > > >> >> Caused by: java.lang.reflect.InvocationTargetException
> > > > >> >>         at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown
> > > > Source)
> > > > >> >>         at
> > > > >>
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > >> >>         at java.lang.reflect.Method.invoke(Method.java:597)
> > > > >> >>         at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
> > > > >> >>         ... 4 more
> > > > >> >>
> > > > >> >>
> > > > >> >> I only have 15 map tasks each on a 10 machine cluster
(total
> 150
> > > map
> > > > >> tasks entering data into Hbase table).
> > > > >> >>
> > > > >> >> Further, I see 2-3 regions perpetually under "Regions
in
> > > Transitions"
> > > > >> in Hbase master web console as follows:
> > > > >> >>
> > > > >> >> 8dcb3edee4e43faa3dbeac2db4f12274userTable1,
> > > pookydearest@hotmail.com
> > > > ,1351728961461.8dcb3edee4e43faa3dbeac2db4f12274.
> > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:39:57 UTC 2012 (409s ago),
> > > > >> server=smartdeals-hbase1-snc1.snc1,60020,1351751785514
> > > > >> >>
> > > > >> >>
> > > > >> >> bb91fd0c855e60dd4159e0ad3fd52cdauserTable1,m_skaare@yahoo.com
> > > > ,1351728968936.bb91fd0c855e60dd4159e0ad3fd52cda.
> > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:42:17 UTC 2012 (269s ago),
> > > > >> server=smartdeals-hbase3-snc1.snc1,60020,1351747466016
> > > > >> >> bd44334a11464baf85013c97d673e600userTable1,
> > tammikilgore@gmail.com
> > > > ,1351728952308.bd44334a11464baf85013c97d673e600.
> > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:42:17 UTC 2012 (269s ago),
> > > > >> server=smartdeals-hbase1-snc1.snc1,60020,1351751785514
> > > > >> >> ed1f7e7908fc232f10d78dd1e796a5d7userTable1,
> jwoodel@triad.rr.com
> > > > ,1351728971232.ed1f7e7908fc232f10d78dd1e796a5d7.
> > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:37:37 UTC 2012 (549s ago),
> > > > >> server=smartdeals-hbase3-snc1.snc1,60020,1351747466016
> > > > >> >>
> > > > >> >>
> > > > >> >> Note these are not going away even after 30 minutes.
> > > > >> >>
> > > > >> >> Further after running
> > > > >> >>
> > > > >> >> hbase hbck -summary I get following:
> > > > >> >>
> > > > >> >> Summary:
> > > > >> >>   -ROOT- is okay.
> > > > >> >>     Number of regions: 1
> > > > >> >>     Deployed on:
>  smartdeals-hbase7-snc1.snc1,60020,1351747458782
> > > > >> >>   .META. is okay.
> > > > >> >>     Number of regions: 1
> > > > >> >>     Deployed on:
>  smartdeals-hbase7-snc1.snc1,60020,1351747458782
> > > > >> >>   test1 is okay.
> > > > >> >>     Number of regions: 1
> > > > >> >>     Deployed on:
>  smartdeals-hbase2-snc1.snc1,60020,1351747457308
> > > > >> >>   userTable1 is okay.
> > > > >> >>     Number of regions: 32
> > > > >> >>     Deployed on:
> >  smartdeals-hbase10-snc1.snc1,60020,1351747456776
> > > > >> smartdeals-hbase2-snc1.snc1,60020,1351747457308
> > > > >> smartdeals-hbase4-snc1.snc1,60020,1351747455571
> > > > >> smartdeals-hbase5-snc1.snc1,60020,1351747458579
> > > > >> smartdeals-hbase6-snc1.snc1,60020,1351747458186
> > > > >> smartdeals-hbase7-snc1.snc1,60020,1351747458782
> > > > >> smartdeals-hbase8-snc1.snc1,60020,1351747459112
> > > > >> smartdeals-hbase9-snc1.snc1,60020,1351747455106
> > > > >> >> 24 inconsistencies detected.
> > > > >> >> Status: INCONSISTENT
> > > > >> >>
> > > > >> >> In master logs I am seeing following error:
> > > > >> >>
> > > > >> >> ERROR org.apache.hadoop.hbase.master.AssignmentManager:
Failed
> > > > >> assignment in: smartdeals-hbase3-snc1.snc1,60020,1351747466016
due
> > to
> > > > >> >>
> > > > >>
> > >
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> > > > >> Received:OPEN for the region:userTable1,m_skaare@yahoo.com
> > > > ,1351728968936.bb91fd0c855e60dd4159e0ad3fd52cda.
> > > > >> ,which we are already trying to OPEN.
> > > > >> >>  at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkIfRegionInTransition(HRegionServer.java:2499)
> > > > >>        at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2457)
> > > > >>        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown
> > Source)
> > > > >>    at
> > > > >>
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>  at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> > > > >>        at
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
> > > > >> >>
> > > > >> >>
> > > > >> >> Am I missing something? How do I recover from this?
How do I
> load
> > > lot
> > > > >> of data via MR into Hbase Tables?
> > > > >> >>
> > > > >> >>
> > > > >> >> I am running under following setup:
> > > > >> >>
> > > > >> >> hadoop:2.0.0-cdh4.0.1
> > > > >> >>
> > > > >> >> hbase: 0.92.1-cdh4.0.1, r
> > > > >> >>
> > > > >> >>
> > > > >> >> Would greatly appreciate any help.
> > > > >> >>
> > > > >> >>
> > > > >> >> Ameya
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Regards,
> > > > >> Cheng Su
> > > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Regards,
> > > > Cheng Su
> > > >
> > >
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
> >
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message