hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Kantikar <am...@groupon.com>
Subject Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR
Date Sat, 03 Nov 2012 00:10:10 GMT
Update::

I pre split the regions and my big  data load MR worked! Thanks for your
help.

One note however, it takes time to create table with pre split regions. And
I have not figured out why.
I continue to see regions as "not deployed" for at least 20-30 minutes
after table is created with pre splits.

Only after those regions are deployed, I can start MR job.

On Thu, Nov 1, 2012 at 4:56 PM, Ameya Kantikar <ameya@groupon.com> wrote:

> Hi Kevin,
>
> I was trying to pre split the table from shell, but either compression or
> splitting did not work.
>
> I tried following:
>
> create 'test1', { NAME => 'cf1', SPLITS => ['a', 'b', 'c', 'd', 'e', 'f',
> 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p' 'q', 'r', 's', 't', 'u',
> 'v', 'w', 'z'] }
> disable 'test1'
> alter 'test1' { NAME => 'cf1', COMPRESSION => 'SNAPPY' }
>
> This worked so far, however
> enable 'test1'  is perpetually stuck.
>
> So i tried building table from the JAVA API.
>
>                     // test code
>                     HTableDescriptor desc = new HTableDescriptor();
>                     desc.setName(hbaseTable.getBytes());
>                     colDesc = new HColumnDescriptor("cf1");
>
> colDesc.setCompressionType(Compression.Algorithm.SNAPPY);
>                     desc.addFamily(colDesc);
>                     byte[][] splits = new byte[23][];
>                     splits[0] = "a".getBytes();
>                      ..
>                     splits[22] = "z".getBytes();
>                     admin.createTable(desc,splits);
>
> This created the pre split table with compression on. However, when I
> started running MR over this table,
> I started getting following errors:
>
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in
.META. for region t1,e,1351811844769.6a7fb0904c323917d322aef160f129cb.
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:988)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:818)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> 	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
> 	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
> 	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:795)
> 	at com.groupon.smartdeals.mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:114)
> 	at com.groupon.smartdeals.mr.hbase.LoadUserCacheInHba
>
> On your earlier suggestion, I tried:
> rm /hbase/tablename
> hbck -fixMeta -fixAssignments
> restart HBase if it is still present
>
> This took away the inconsistencies, but data is gone too.
>
> Also, I have noticed that some regions of my pre split table are in "not
> deployed" state. Hbase Master web console shows following for one of the
> regions of table t1:
>
>
> t1,e,1351811844769.6a7fb0904c323917d322aef160f129cb. not deployed
>
> There are not any relevant exceptions on master or on region servers. I
> am occasionally seeing:
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException
>
> We are trying out Hbase at Groupon, and this cluster is not in production
> yet, so thankfully we can afford to wipe the data, or restart the server
> etc.
>
> Thanks,
>
> Ameya
>
>
>
>
> On Thu, Nov 1, 2012 at 12:55 PM, Kevin O'dell <kevin.odell@cloudera.com>wrote:
>
>> Ameya,
>>
>>  If your new table goes well(did you presplit this time?), then what we
>> can
>> do for the old one:
>>
>> rm /hbase/tablename
>> hbck -fixMeta -fixAssignments
>> restart HBase if it is still present
>> All should be well.
>>
>> Please let us know how it goes.
>>
>> On Thu, Nov 1, 2012 at 2:44 PM, Ameya Kantikar <ameya@groupon.com> wrote:
>>
>> > Thanks Kevin & Ram. Please find my answers below:
>> >
>> > Did you presplit your table? - NO
>> >
>> > You are on .92, might as well take advantage of HFilev2 and use 10GB
>> region
>> > sizes -
>> >
>> >  - I have put my region size now at 10GB and running another load in a
>> > separate table, but my existing table is still in bad shape.
>> >
>> > Loading over MR, I am assuming puts?
>> > -Yes
>> >
>> > Did you tune your memstore and Hlog
>> > size?
>> > -Not yet. I am running with whatever are the defaults.
>> >
>> > You aren't using a different client version or something strange like
>> that
>> > are you? - Nope. Its the same jar everywhere.
>> >
>> > You can't close hlog messages seem to indicate an inability to talk to
>> > HDFS.  Did you have connection issues there?
>> > - I did find log on 1 data node with some HDFS issue. But that was only
>> 1
>> > data node. All other data node looked good.
>> > Note, I also ran another big distcp job on the same cluster and did not
>> > find any issues.
>> >
>> > I also restarted the cluster (all nodes, including hadoop), hbase hbck
>> is
>> > not showing inconsistencies, but my table is still neither enabled nor
>> > disabled.
>> > I ran MR job to load data, but it continued to throw same earlier
>> errors.
>> >
>> > Now I am running separate job loading data into brand new table, with
>> max
>> > region size at 10 GB. I'll get back to you with results on that one. But
>> > existing table is still not reachable.
>> >
>> > Thanks for your help.
>> >
>> > Ameya
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Nov 1, 2012 at 6:35 AM, Kevin O'dell <kevin.odell@cloudera.com
>> > >wrote:
>> >
>> > > Couple thoughts(it is still early here so bear with me):
>> > >
>> > > Did you presplit your table?
>> > >
>> > > You are on .92, might as well take advantage of HFilev2 and use 10GB
>> > region
>> > > sizes
>> > >
>> > > Loading over MR, I am assuming puts?  Did you tune your memstore and
>> Hlog
>> > > size?
>> > >
>> > > You aren't using a different client version or something strange like
>> > that
>> > > are you?
>> > >
>> > > You can't close hlog messages seem to indicate an inability to talk to
>> > > HDFS.  Did you have connection issues there?
>> > >
>> > >
>> > >
>> > > On Thu, Nov 1, 2012 at 5:20 AM, ramkrishna vasudevan <
>> > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > >
>> > > > Can you try restarting the cluster i mean the master and RS.
>> > > > Also if this things persists try to clear the zk data and restart.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > > On Thu, Nov 1, 2012 at 2:46 PM, Cheng Su <scarcer.cn@gmail.com>
>> wrote:
>> > > >
>> > > > > Sorry, my mistake. Ignore about the "max store size of a single
>> CF"
>> > > > please.
>> > > > >
>> > > > > m(_ _)m
>> > > > >
>> > > > > On Thu, Nov 1, 2012 at 4:43 PM, Ameya Kantikar <ameya@groupon.com
>> >
>> > > > wrote:
>> > > > > > Thanks Cheng. I'll try increasing my max region size limit.
>> > > > > >
>> > > > > > However I am not clear with this math:
>> > > > > >
>> > > > > > "Since you set the max file size to 2G, you can only store
2XN G
>> > data
>> > > > > > into a single CF."
>> > > > > >
>> > > > > > Why is that? My assumption is, even though single region
can
>> only
>> > be
>> > > 2
>> > > > > GB,
>> > > > > > I can still have hundreds of regions, and hence can store
200GB+
>> > data
>> > > > in
>> > > > > > single CF on my 10 machine cluster.
>> > > > > >
>> > > > > > Ameya
>> > > > > >
>> > > > > >
>> > > > > > On Thu, Nov 1, 2012 at 1:19 AM, Cheng Su <scarcer.cn@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > >> I met same problem these days.
>> > > > > >> I'm not very sure the error log is exactly same, but
I do have
>> the
>> > > > > >> same exception
>> > > > > >>
>> > > > > >>
>> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> > > > > >> Failed 1 action: NotServingRegionException: 1 time,
servers
>> with
>> > > > > >> issues: smartdeals-hbase8-snc1.snc1:60020,
>> > > > > >>
>> > > > > >> and the table is also neither enabled nor disabled,
thus I
>> can't
>> > > drop
>> > > > > it.
>> > > > > >>
>> > > > > >> I guess the problem is the total store size.
>> > > > > >> How many region server do you have?
>> > > > > >> Since you set the max file size to 2G, you can only
store 2XN G
>> > data
>> > > > > >> into a single CF.
>> > > > > >> (N is the number of your region servers)
>> > > > > >>
>> > > > > >> You might want to increase the max file size or region
servers.
>> > > > > >>
>> > > > > >> On Thu, Nov 1, 2012 at 3:29 PM, Ameya Kantikar <
>> ameya@groupon.com
>> > >
>> > > > > wrote:
>> > > > > >> > One more thing, the Hbase table in question is
neither
>> enabled,
>> > > nor
>> > > > > >> > disabled:
>> > > > > >> >
>> > > > > >> > hbase(main):006:0> is_disabled 'userTable1'
>> > > > > >> > false
>> > > > > >> >
>> > > > > >> > 0 row(s) in 0.0040 seconds
>> > > > > >> >
>> > > > > >> > hbase(main):007:0> is_enabled 'userTable1'
>> > > > > >> > false
>> > > > > >> >
>> > > > > >> > 0 row(s) in 0.0040 seconds
>> > > > > >> >
>> > > > > >> > Ameya
>> > > > > >> >
>> > > > > >> > On Thu, Nov 1, 2012 at 12:02 AM, Ameya Kantikar
<
>> > > ameya@groupon.com>
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> >> Hi,
>> > > > > >> >>
>> > > > > >> >> I am trying to load lot of data (around 1.5
TB) into a
>> single
>> > > Hbase
>> > > > > >> table.
>> > > > > >> >> I have setup region size at 2 GB. I also
>> > > > > >> >> set hbase.regionserver.handler.count at 30.
>> > > > > >> >>
>> > > > > >> >> When I start loading data via MR, after a while,
tasks start
>> > > > failing
>> > > > > >> with
>> > > > > >> >> following error:
>> > > > > >> >>
>> > > > > >> >>
>> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> > > > > >> Failed 1 action: NotServingRegionException: 1 time,
servers
>> with
>> > > > issues:
>> > > > > >> smartdeals-hbase8-snc1.snc1:60020,
>> > > > > >> >>       at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641)
>> > > > > >> >>       at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>> > > > > >> >>       at
>> > > > > >>
>> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
>> > > > > >> >>       at
>> > > > org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
>> > > > > >> >>       at
>> > > org.apache.hadoop.hbase.client.HTable.put(HTable.java:795)
>> > > > > >> >>       at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> com..mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:83)
>> > > > > >> >>       at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> com..mr.hbase.LoadUserCacheInHbase$TokenizerMapper.map(LoadUserCacheInHbase.java:33)
>> > > > > >> >>       at
>> > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>> > > > > >> >>       at
>> > > > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
>> > > > > >> >>       at org.apache.hadoop.mapred.MapTask.run(MapTask.j
>> > > > > >> >>
>> > > > > >> >> On the hbase8 machine I see following in logs:
>> > > > > >> >>
>> > > > > >> >> ERROR org.apache.hadoop.hbase.regionserver.wal.HLog:
Error
>> > while
>> > > > > >> syncing, requesting close of hlog
>> > > > > >> >> java.io.IOException: Reflection
>> > > > > >> >>         at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
>> > > > > >> >>         at
>> > > > > >>
>> > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1109)
>> > > > > >> >>         at
>> > > > > >>
>> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1213)
>> > > > > >> >>         at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1071)
>> > > > > >> >>         at java.lang.Thread.run(Thread.java:662)
>> > > > > >> >> Caused by: java.lang.reflect.InvocationTargetException
>> > > > > >> >>         at
>> sun.reflect.GeneratedMethodAccessor11.invoke(Unknown
>> > > > > Source)
>> > > > > >> >>         at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > > > > >> >>         at java.lang.reflect.Method.invoke(Method.java:597)
>> > > > > >> >>         at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
>> > > > > >> >>         ... 4 more
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> I only have 15 map tasks each on a 10 machine
cluster (total
>> > 150
>> > > > map
>> > > > > >> tasks entering data into Hbase table).
>> > > > > >> >>
>> > > > > >> >> Further, I see 2-3 regions perpetually under
"Regions in
>> > > > Transitions"
>> > > > > >> in Hbase master web console as follows:
>> > > > > >> >>
>> > > > > >> >> 8dcb3edee4e43faa3dbeac2db4f12274userTable1,
>> > > > pookydearest@hotmail.com
>> > > > > ,1351728961461.8dcb3edee4e43faa3dbeac2db4f12274.
>> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:39:57 UTC 2012
(409s ago),
>> > > > > >> server=smartdeals-hbase1-snc1.snc1,60020,1351751785514
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> bb91fd0c855e60dd4159e0ad3fd52cdauserTable1,
>> m_skaare@yahoo.com
>> > > > > ,1351728968936.bb91fd0c855e60dd4159e0ad3fd52cda.
>> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:42:17 UTC 2012
(269s ago),
>> > > > > >> server=smartdeals-hbase3-snc1.snc1,60020,1351747466016
>> > > > > >> >> bd44334a11464baf85013c97d673e600userTable1,
>> > > tammikilgore@gmail.com
>> > > > > ,1351728952308.bd44334a11464baf85013c97d673e600.
>> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:42:17 UTC 2012
(269s ago),
>> > > > > >> server=smartdeals-hbase1-snc1.snc1,60020,1351751785514
>> > > > > >> >> ed1f7e7908fc232f10d78dd1e796a5d7userTable1,
>> > jwoodel@triad.rr.com
>> > > > > ,1351728971232.ed1f7e7908fc232f10d78dd1e796a5d7.
>> > > > > >> state=PENDING_OPEN, ts=Thu Nov 01 06:37:37 UTC 2012
(549s ago),
>> > > > > >> server=smartdeals-hbase3-snc1.snc1,60020,1351747466016
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> Note these are not going away even after 30
minutes.
>> > > > > >> >>
>> > > > > >> >> Further after running
>> > > > > >> >>
>> > > > > >> >> hbase hbck -summary I get following:
>> > > > > >> >>
>> > > > > >> >> Summary:
>> > > > > >> >>   -ROOT- is okay.
>> > > > > >> >>     Number of regions: 1
>> > > > > >> >>     Deployed on:
>> >  smartdeals-hbase7-snc1.snc1,60020,1351747458782
>> > > > > >> >>   .META. is okay.
>> > > > > >> >>     Number of regions: 1
>> > > > > >> >>     Deployed on:
>> >  smartdeals-hbase7-snc1.snc1,60020,1351747458782
>> > > > > >> >>   test1 is okay.
>> > > > > >> >>     Number of regions: 1
>> > > > > >> >>     Deployed on:
>> >  smartdeals-hbase2-snc1.snc1,60020,1351747457308
>> > > > > >> >>   userTable1 is okay.
>> > > > > >> >>     Number of regions: 32
>> > > > > >> >>     Deployed on:
>> > >  smartdeals-hbase10-snc1.snc1,60020,1351747456776
>> > > > > >> smartdeals-hbase2-snc1.snc1,60020,1351747457308
>> > > > > >> smartdeals-hbase4-snc1.snc1,60020,1351747455571
>> > > > > >> smartdeals-hbase5-snc1.snc1,60020,1351747458579
>> > > > > >> smartdeals-hbase6-snc1.snc1,60020,1351747458186
>> > > > > >> smartdeals-hbase7-snc1.snc1,60020,1351747458782
>> > > > > >> smartdeals-hbase8-snc1.snc1,60020,1351747459112
>> > > > > >> smartdeals-hbase9-snc1.snc1,60020,1351747455106
>> > > > > >> >> 24 inconsistencies detected.
>> > > > > >> >> Status: INCONSISTENT
>> > > > > >> >>
>> > > > > >> >> In master logs I am seeing following error:
>> > > > > >> >>
>> > > > > >> >> ERROR org.apache.hadoop.hbase.master.AssignmentManager:
>> Failed
>> > > > > >> assignment in: smartdeals-hbase3-snc1.snc1,60020,1351747466016
>> due
>> > > to
>> > > > > >> >>
>> > > > > >>
>> > > >
>> > org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
>> > > > > >> Received:OPEN for the region:userTable1,m_skaare@yahoo.com
>> > > > > ,1351728968936.bb91fd0c855e60dd4159e0ad3fd52cda.
>> > > > > >> ,which we are already trying to OPEN.
>> > > > > >> >>  at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.checkIfRegionInTransition(HRegionServer.java:2499)
>> > > > > >>        at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2457)
>> > > > > >>        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown
>> > > Source)
>> > > > > >>    at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > > > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> >  at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>> > > > > >>        at
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336)
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> Am I missing something? How do I recover from
this? How do I
>> > load
>> > > > lot
>> > > > > >> of data via MR into Hbase Tables?
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> I am running under following setup:
>> > > > > >> >>
>> > > > > >> >> hadoop:2.0.0-cdh4.0.1
>> > > > > >> >>
>> > > > > >> >> hbase: 0.92.1-cdh4.0.1, r
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> Would greatly appreciate any help.
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> Ameya
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> --
>> > > > > >>
>> > > > > >> Regards,
>> > > > > >> Cheng Su
>> > > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Regards,
>> > > > > Cheng Su
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Kevin O'Dell
>> > > Customer Operations Engineer, Cloudera
>> > >
>> >
>>
>>
>>
>> --
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message