hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: Error during HBaseAdmin.split: Exception: org.apache.hadoop.hbase.NotServingRegionException, What does that mean?
Date Wed, 17 Sep 2014 19:19:49 GMT
Thanks Jianshi for that helpful information,

I think for use case 1) it depends on the data ingestion rate when the
regions need to split. The synchronous split operation makes some sense
there  if you want the regions to contain specific time ranges and/or
number of records.

For use case 2) I think is a good match for the KeyPrefixRegionSplitPolicy
or DelimitedKeyPrefixRegionSplitPolicy. Since the regions will be split
based on the <type> if type length is fixed or if the type is of varying
length but delimited with |

On a second thought, it might be even possible to solve 1) with those
prefix based split policies if you use a prefix for your key that also
varies monotonically or can be passed by the client when it has reached
some threshold, e.g. after writing X billion data points, use prefix 001
and next Y billion data rows use prefix 002 or something like that.

cheers,
esteban.


--
Cloudera, Inc.


On Wed, Sep 17, 2014 at 11:53 AM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Hi Esteban,
>
> Two reasons to split dynamically,
>
> 1) I have a column family that stores timeseries data for mapreduce tasks,
> and the rowkey is monotonically increasing to make scanning easier.
>
> 2) (a better reason), I'm storing multiple types of data in the same table,
> and I have about 500TB of data in total. That's many billions of rows and
> many thousands of regions. I want to make sure ingesting one type of data
> won't touch every region which will cause a lot of fragments and merge
> operations, the rowkey is designed as <type>|<hash>|<id>.
>
> So either way I would want a dynamic split in my design.
>
> Jianshi
>
>
> On Thu, Sep 18, 2014 at 2:39 AM, Esteban Gutierrez <esteban@cloudera.com>
> wrote:
>
> > Jianshi,
> >
> > The retry is not an expected behavior that the client should be doing. In
> > fact you don't want your clients to issue admin operations to the cluster
> > ;)
> >
> > Shahab's option is the best alternative by polling when the number of
> > regions has changed in the table you want to modify the splits
> dynamically.
> > The JIRA that Ted suggested requires modification in the core table
> > operations to support sync operations and requires some major work to do
> it
> > right. Ted's alternative to create the splits at table creation time is
> the
> > best option if you can pre-split IMHO.
> >
> > If you could elaborate more on the practical reasons you mention to
> create
> > synchronously those new regions that would be great for us. Maybe its
> > related to multi-tenancy but I'm just guessing :)
> >
> > esteban.
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Wed, Sep 17, 2014 at 11:09 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Jianshi:
> > > See HBASE-11608 Add synchronous split
> > >
> > > bq. createTable does something special?
> > >
> > > Yes. See this in HBaseAdmin:
> > >
> > >   public void createTable(final HTableDescriptor desc, byte [][]
> > splitKeys)
> > >
> > > On Wed, Sep 17, 2014 at 10:58 AM, Jianshi Huang <
> jianshi.huang@gmail.com
> > >
> > > wrote:
> > >
> > > > I see Shahab, async makes sense, but I prefer that the HBase client
> > does
> > > > the retry for me, and let me specify a timeout parameter.
> > > >
> > > > One question, does that mean adding multiple splits into one region
> has
> > > to
> > > > be done sequentially? How can I add region splits in parallel? Does
> > > > createTable does something special?
> > > >
> > > >
> > > > Jianshi
> > > >
> > > >
> > > > On Wed, Sep 17, 2014 at 8:06 PM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > Split is an async operation. When you call it, and the call
> returns,
> > it
> > > > > does not mean that the region has been created yet.
> > > > >
> > > > > So either you wait for a while (using Thread.sleep) or check for
> the
> > > > number
> > > > > of regions in a loop and until they have increased to the value you
> > > want
> > > > > and then access the region. The former is not a good idea, though
> you
> > > can
> > > > > try it out just to make sure that this is indeed the issue.
> > > > >
> > > > > What am I suggesting is something like (pseudo code):
> > > > >
> > > > > while(new#regions > old#regions)
> > > > > {
> > > > >    new#regions = admin.getLatest#regions
> > > > > }
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > > On Wed, Sep 17, 2014 at 5:39 AM, Jianshi Huang <
> > > jianshi.huang@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I constantly get the following errors when I tried to add splits
> > to a
> > > > > > table.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > > > > org.apache.hadoop.hbase.NotServingRegionException: Region
> > > > > >
> > > > >
> > > >
> > >
> >
> grapple_vertices,cust|rval#7ffffeb7cffca280|1636500018299676757,1410945568
> > > > > > 484.e7743495366df3c82a8571b36c2bdac3. is not online on
> > > > > > lvshdc5dn0193.lvs.paypal.com,60020,1405014719359
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2676)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4095)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:3818)
> > > > > >         at
> > > > > >
> > > > > >
> > > > > > But when I checked the region server (from hbase' webUI), the
> > region
> > > is
> > > > > > actually listed there.
> > > > > >
> > > > > > What does the error mean actually? How can I solve it?
> > > > > >
> > > > > > Currently I'm adding splits single-threaded, and I want to make
> it
> > > > > > parallel, is there anything I need to be careful about?
> > > > > >
> > > > > > Here's the code for adding splits:
> > > > > >
> > > > > >   def addSplits(tableName: String, splitKeys: Seq[Array[Byte]]):
> > Unit
> > > > = {
> > > > > >     val admin = new HBaseAdmin(conn)
> > > > > >
> > > > > >     try {
> > > > > >       val regions =
> > admin.getTableRegions(tableName.getBytes("UTF8"))
> > > > > >       val regionStartKeys = regions.map(_.getStartKey)
> > > > > >       val splits = splitKeys.diff(regionStartKeys)
> > > > > >
> > > > > >       splits.foreach { splitPoint =>
> > > > > >         admin.split(tableName.getBytes("UTF8"), splitPoint)
> > > > > >       }
> > > > > >       // NOTE: important!
> > > > > >       admin.balancer()
> > > > > >     }
> > > > > >     finally {
> > > > > >       admin.close()
> > > > > >     }
> > > > > >   }
> > > > > >
> > > > > >
> > > > > > Any help is appreciated.
> > > > > >
> > > > > > --
> > > > > > Jianshi Huang
> > > > > >
> > > > > > LinkedIn: jianshi
> > > > > > Twitter: @jshuang
> > > > > > Github & Blog: http://huangjs.github.com/
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message