hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Kantikar <am...@groupon.com>
Subject Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR
Date Thu, 01 Nov 2012 23:56:39 GMT
Hi Kevin,

I was trying to pre split the table from shell, but either compression or
splitting did not work.

I tried following:

create 'test1', { NAME => 'cf1', SPLITS => ['a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p' 'q', 'r', 's', 't', 'u',
'v', 'w', 'z'] }
disable 'test1'
alter 'test1' { NAME => 'cf1', COMPRESSION => 'SNAPPY' }

This worked so far, however
enable 'test1'  is perpetually stuck.

So i tried building table from the JAVA API.

                    // test code
                    HTableDescriptor desc = new HTableDescriptor();
                    desc.setName(hbaseTable.getBytes());
                    colDesc = new HColumnDescriptor("cf1");

colDesc.setCompressionType(Compression.Algorithm.SNAPPY);
                    desc.addFamily(colDesc);
                    byte[][] splits = new byte[23][];
                    splits[0] = "a".getBytes();
                     ..
                    splits[22] = "z".getBytes();
                    admin.createTable(desc,splits);

This created the pre split table with compression on. However, when I
started running MR over this table,
I started getting following errors:

org.apache.hadoop.hbase.client.NoServerForRegionException: No server
address listed in .META. for region
t1,e,1351811844769.6a7fb0904c323917d322aef160f129cb.
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:988)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:818)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:795)
	at com.groupon.smartdeals.mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:114)
	at com.groupon.smartdeals.mr.hbase.LoadUserCacheInHba

On your earlier suggestion, I tried:
rm /hbase/tablename
hbck -fixMeta -fixAssignments
restart HBase if it is still present

This took away the inconsistencies, but data is gone too.

Also, I have noticed that some regions of my pre split table are in "not
deployed" state. Hbase Master web console shows following for one of the
regions of table t1:


t1,e,1351811844769.6a7fb0904c323917d322aef160f129cb. not deployed

There are not any relevant exceptions on master or on region servers. I
am occasionally seeing:
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException

We are trying out Hbase at Groupon, and this cluster is not in production
yet, so thankfully we can afford to wipe the data, or restart the server
etc.

Thanks,

Ameya




On Thu, Nov 1, 2012 at 12:55 PM, Kevin O'dell <kevin.odell@cloudera.com>wrote:

> Ameya,
>
>  If your new table goes well(did you presplit this time?), then what we can
> do for the old one:
>
> rm /hbase/tablename
> hbck -fixMeta -fixAssignments
> restart HBase if it is still present
> All should be well.
>
> Please let us know how it goes.
>
> On Thu, Nov 1, 2012 at 2:44 PM, Ameya Kantikar <ameya@groupon.com> wrote:
>
> > Thanks Kevin & Ram. Please find my answers below:
> >
> > Did you presplit your table? - NO
> >
> > You are on .92, might as well take advantage of HFilev2 and use 10GB
> region
> > sizes -
> >
> >  - I have put my region size now at 10GB and running another load in a
> > separate table, but my existing table is still in bad shape.
> >
> > Loading over MR, I am assuming puts?
> > -Yes
> >
> > Did you tune your memstore and Hlog
> > size?
> > -Not yet. I am running with whatever are the defaults.
> >
> > You aren't using a different client version or something strange like
> that
> > are you? - Nope. Its the same jar everywhere.
> >
> > You can't close hlog messages seem to indicate an inability to talk to
> > HDFS.  Did you have connection issues there?
> > - I did find log on 1 data node with some HDFS issue. But that was only 1
> > data node. All other data node looked good.
> > Note, I also ran another big distcp job on the same cluster and did not
> > find any issues.
> >
> > I also restarted the cluster (all nodes, including hadoop), hbase hbck is
> > not showing inconsistencies, but my table is still neither enabled nor
> > disabled.
> > I ran MR job to load data, but it continued to throw same earlier errors.
> >
> > Now I am running separate job loading data into brand new table, with max
> > region size at 10 GB. I'll get back to you with results on that one. But
> > existing table is still not reachable.
> >
> > Thanks for your help.
> >
> > Ameya
> >
> >
> >
> >
> >
> > On Thu, Nov 1, 2012 at 6:35 AM, Kevin O'dell <kevin.odell@cloudera.com
> > >wrote:
> >
> > > Couple thoughts(it is still early here so bear with me):
> > >
> > > Did you presplit your table?
> > >
> > > You are on .92, might as well take advantage of HFilev2 and use 10GB
> > region
> > > sizes
> > >
> > > Loading over MR, I am assuming puts?  Did you tune your memstore and
> Hlog
> > > size?
> > >
> > > You aren't using a different client version or something strange like
> > that
> > > are you?
> > >
> > > You can't close hlog messages seem to indicate an inability to talk to
> > > HDFS.  Did you have connection issues there?
> > >
> > >
> > >
> > > On Thu, Nov 1, 2012 at 5:20 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Can you try restarting the cluster i mean the master and RS.
> > > > Also if this things persists try to clear the zk data and restart.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Thu, Nov 1, 2012 at 2:46 PM, Cheng Su <scarcer.cn@gmail.com>
> wrote:
> > > >
> > > > > Sorry, my mistake. Ignore about the "max store size of a single CF"
> > > > please.
> > > > >
> > > > > m(_ _)m
> > > > >
> > > > > On Thu, Nov 1, 2012 at 4:43 PM, Ameya Kantikar <ameya@groupon.com>
> > > > wrote:
> > > > > > Thanks Cheng. I'll try increasing my max region size limit.
> > > > > >
> > > > > > However I am not clear with this math:
> > > > > >
> > > > > > "Since you set the max file size to 2G, you can only store 2XN
G
> > data
> > > > > > into a single CF."
> > > > > >
> > > > > > Why is that? My assumption is, even though single region can
only
> > be
> > > 2
> > > > > GB,
> > > > > > I can still have hundreds of regions, and hence can store 200GB+
> > data
> > > > in
> > > > > > single CF on my 10 machine cluster.
> > > > > >
> > > > > > Ameya
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 1, 2012 at 1:19 AM, Cheng Su <scarcer.cn@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> I met same problem these days.
> > > > > >> I'm not very sure the error log is exactly same, but I do
have
> the
> > > > > >> same exception
> > > > > >>
> > > > > >>
> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > > >> Failed 1 action: NotServingRegionException: 1 time, servers
with
> > > > > >> issues: smartdeals-hbase8-snc1.snc1:60020,
> > > > > >>
> > > > > >> and the table is also neither enabled nor disabled, thus
I can't
> > > drop
> > > > > it.
> > > > > >>
> > > > > >> I guess the problem is the total store size.
> > > > > >> How many region server do you have?
> > > > > >> Since you set the max file size to 2G, you can only store
2XN G
> > data
> > > > > >> into a single CF.
> > > > > >> (N is the number of your region servers)
> > > > > >>
> > > > > >> You might want to increase the max file size or region servers.
> > > > > >>
> > > > > >> On Thu, Nov 1, 2012 at 3:29 PM, Ameya Kantikar <
> ameya@groupon.com
> > >
> > > > > wrote:
> > > > > >> > One more thing, the Hbase table in question is neither
> enabled,
> > > nor
> > > > > >> > disabled:
> > > > > >> >
> > > > > >> > hbase(main):006:0> is_disabled 'userTable1'
> > > > > >> > false
> > > > > >> >
> > > > > >> > 0 row(s) in 0.0040 seconds
> > > > > >> >
> > > > > >> > hbase(main):007:0> is_enabled 'userTable1'
> > > > > >> > false
> > > > > >> >
> > > > > >> > 0 row(s) in 0.0040 seconds
> > > > > >> >
> > > > > >> > Ameya
> > > > > >> >
> > > > > >> > On Thu, Nov 1, 2012 at 12:02 AM, Ameya Kantikar <
> > > ameya@groupon.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> >> Hi,
> > > > > >> >>
> > > > > >> >> I am trying to load lot of data (around 1.5 TB)
into a single
> > > Hbase
> > > > > >> table.
> > > > > >> >> I have setup region size at 2 GB. I also
> > > > > >> >> set hbase.regionserver.handler.count at 30.
> > > > > >> >>
> > > > > >> >> When I start loading data via MR, after a while,
tasks start
> > > > failing
> > > > > >> with
> > > > > >> >> following error:
> > > > > >> >>
> > > > > >> >>
> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > > >> Failed 1 action: NotServingRegionException: 1 time, servers
with
> > > > issues:
> > > > > >> smartdeals-hbase8-snc1.snc1:60020,
> > > > > >> >>       at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641)
> > > > > >> >>       at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> > > > > >> >>       at
> > > > > >>
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
> > > > > >> >>       at
> > > > org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
> > > > > >> >>       at
> > > org.apache.hadoop.hbase.client.HTable.put(HTable.java:795)
> > > > > >> >>       at
> > > > > >>
> > > > >
> > > >
> > >
> >
> com..mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:83)
> > > > > >> >>       at
> > > > > >>
> > > > >
> > > >
> > >
> >
> com..mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:33)
> > > > > >> >>       at
> > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
> > > > > >> >>       at
> > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
> > > > > >> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.j
> > > > > >> >>
> > > > > >> >> On the hbase8 machine I see following in logs:
> > > > > >> >>
> > > > > >> >> ERROR org.apache.hadoop.hbase.regionserver.wal.HLog:
Error
> > while
> > > > > >> syncing, requesting close of hlog
> > > > > >> >> java.io.IOException: Reflection
> > > > > >> >>         at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
> > > > > >> >>         at
> > > > > >>
> > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1109)
> > > > > >> >>         at
> > > > > >>
> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1213)
> > > > > >> >>         at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1071)
> > > > > >> >>         at java.lang.Thread.run(Thread.java:662)
> > > > > >> >> Caused by: java.lang.reflect.InvocationTargetException
> > > > > >> >>         at
> sun.reflect.GeneratedMethodAccessor11.invoke(Unknown
> > > > > Source)
> > > > > >> >>         at
> > > > > >>
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > > >> >>         at java.lang.reflect.Method.invoke(Method.java:597)
> > > > > >> >>         at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
> > > > > >> >>         ... 4 more
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> I only have 15 map tasks each on a 10 machine cluster
(total
> > 150
> > > > map
> > > > > >> tasks entering data into Hbase table).
> > > > > >> >>
> > > > > >> >> Further, I see 2-3 regions perpetually under "Regions
in
> > > > Transitions"
> > > > > >> in Hbase master web console as follows:
> > > > > >> >>
> > > > > >> >> 8dcb3edee4e43faa3dbeac2db4f12274userTable1,
> > > > pookydearest@hotmail.com
> > > > > ,1351728961461.8dcb3edee4e43faa3dbeac2db4f12274.
> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:39:57 UTC 2012 (409s
ago),
> > > > > >> server=smartdeals-hbase1-snc1.snc1,60020,1351751785514
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> bb91fd0c855e60dd4159e0ad3fd52cdauserTable1,
> m_skaare@yahoo.com
> > > > > ,1351728968936.bb91fd0c855e60dd4159e0ad3fd52cda.
> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:42:17 UTC 2012 (269s
ago),
> > > > > >> server=smartdeals-hbase3-snc1.snc1,60020,1351747466016
> > > > > >> >> bd44334a11464baf85013c97d673e600userTable1,
> > > tammikilgore@gmail.com
> > > > > ,1351728952308.bd44334a11464baf85013c97d673e600.
> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:42:17 UTC 2012 (269s
ago),
> > > > > >> server=smartdeals-hbase1-snc1.snc1,60020,1351751785514
> > > > > >> >> ed1f7e7908fc232f10d78dd1e796a5d7userTable1,
> > jwoodel@triad.rr.com
> > > > > ,1351728971232.ed1f7e7908fc232f10d78dd1e796a5d7.
> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:37:37 UTC 2012 (549s
ago),
> > > > > >> server=smartdeals-hbase3-snc1.snc1,60020,1351747466016
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> Note these are not going away even after 30 minutes.
> > > > > >> >>
> > > > > >> >> Further after running
> > > > > >> >>
> > > > > >> >> hbase hbck -summary I get following:
> > > > > >> >>
> > > > > >> >> Summary:
> > > > > >> >>   -ROOT- is okay.
> > > > > >> >>     Number of regions: 1
> > > > > >> >>     Deployed on:
> >  smartdeals-hbase7-snc1.snc1,60020,1351747458782
> > > > > >> >>   .META. is okay.
> > > > > >> >>     Number of regions: 1
> > > > > >> >>     Deployed on:
> >  smartdeals-hbase7-snc1.snc1,60020,1351747458782
> > > > > >> >>   test1 is okay.
> > > > > >> >>     Number of regions: 1
> > > > > >> >>     Deployed on:
> >  smartdeals-hbase2-snc1.snc1,60020,1351747457308
> > > > > >> >>   userTable1 is okay.
> > > > > >> >>     Number of regions: 32
> > > > > >> >>     Deployed on:
> > >  smartdeals-hbase10-snc1.snc1,60020,1351747456776
> > > > > >> smartdeals-hbase2-snc1.snc1,60020,1351747457308
> > > > > >> smartdeals-hbase4-snc1.snc1,60020,1351747455571
> > > > > >> smartdeals-hbase5-snc1.snc1,60020,1351747458579
> > > > > >> smartdeals-hbase6-snc1.snc1,60020,1351747458186
> > > > > >> smartdeals-hbase7-snc1.snc1,60020,1351747458782
> > > > > >> smartdeals-hbase8-snc1.snc1,60020,1351747459112
> > > > > >> smartdeals-hbase9-snc1.snc1,60020,1351747455106
> > > > > >> >> 24 inconsistencies detected.
> > > > > >> >> Status: INCONSISTENT
> > > > > >> >>
> > > > > >> >> In master logs I am seeing following error:
> > > > > >> >>
> > > > > >> >> ERROR org.apache.hadoop.hbase.master.AssignmentManager:
> Failed
> > > > > >> assignment in: smartdeals-hbase3-snc1.snc1,60020,1351747466016
> due
> > > to
> > > > > >> >>
> > > > > >>
> > > >
> > org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> > > > > >> Received:OPEN for the region:userTable1,m_skaare@yahoo.com
> > > > > ,1351728968936.bb91fd0c855e60dd4159e0ad3fd52cda.
> > > > > >> ,which we are already trying to OPEN.
> > > > > >> >>  at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkIfRegionInTransition(HRegionServer.java:2499)
> > > > > >>        at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2457)
> > > > > >>        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown
> > > Source)
> > > > > >>    at
> > > > > >>
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >  at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> > > > > >>        at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> Am I missing something? How do I recover from this?
How do I
> > load
> > > > lot
> > > > > >> of data via MR into Hbase Tables?
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> I am running under following setup:
> > > > > >> >>
> > > > > >> >> hadoop:2.0.0-cdh4.0.1
> > > > > >> >>
> > > > > >> >> hbase: 0.92.1-cdh4.0.1, r
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> Would greatly appreciate any help.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> Ameya
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> Regards,
> > > > > >> Cheng Su
> > > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Regards,
> > > > > Cheng Su
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Kevin O'Dell
> > > Customer Operations Engineer, Cloudera
> > >
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message