hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Splitting an existing table with new keys.
Date Tue, 19 Aug 2014 20:29:53 GMT
So the situation here is that we are trying to bulk load data in to a
table. But each load of data has such range of keys that it will go to a
specific continuous chunk of the region servers.

In other other words, at each bulk load, we face hot-spotting but not at
the end like the conventional case but it can be any where in between the
row-key range of our table.

Please note that the split point that I am trying to split on does not
exist in the table yet. I am trying to prepare the existing table with
data, by splitting into regions into which I will then bulk import my new
data, to avoid hotspotting on one region server.

The proof-of-concept code is below. Trying to split data into 16 regions
('0' to 'f' of the guid since each row in this current load shares the same
value for the first 2 fields of the row key).

Key is:
data_source + time-in-long + 32-bytes-random-guid

/*****/

byte[][] splits = new byte[16][];
byte[] dataSourceId = Bytes.toBytes(dataSource.getDataSourceID());
byte[] loadTime = Bytes.toBytes(batchLoadTime);
byte[] guidPrefix = null;

  for(int i=0; i<splitPointsPrefixes.length; i++)  {

   guidPrefix = Bytes.toBytes(splitPointsPrefixes[i]);
   splits[i] = new byte[dataSourceId.length + loadTime.length + guidPrefix.
length];
   ByteBuffer splitBuffer = ByteBuffer.wrap(splits[i]);
   splitBuffer.put(dataSourceId);
   splitBuffer.put(loadTime);
   splitBuffer.put(guidPrefix);
}

byte[] tableNameInBytes = Bytes.toBytes(tableName);
HBaseAdmin admin = new HBaseAdmin(HBaseConfiguration.create(getConf()));

for(byte[] split : splits)  {
   //This is asynchronous. Should I wait here after each split to move onto
next one?
   admin.split(tableNameInBytes, split);
}
/*****/

Regards,
Shahab


On Tue, Aug 19, 2014 at 4:13 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Shahab,
>
> can you sahre your code? Seems that the RS you reached did not have the
> expected region. How is your table status in the web interface?
>
> JM
>
>
> 2014-08-19 16:11 GMT-04:00 Shahab Yunus <shahab.yunus@gmail.com>:
>
> > I have a table already created and with some data. I want to split it
> > trough code using HBaseAdmin api into multiple regions, while specifying
> > keys that do not exist in the table.
> >
> > I am getting the exception below which makes sense because the key
> doesn't
> > exist yet. But at the time of creation of the table we can indeed
> pre-split
> > it using keys that don't exist.
> >
> > Is it possible to do it for table that already exists and has data?
> >
> > *Caused by:
> >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > org.apache.hadoop.hbase.NotServingRegionException: *
> >
> >
> > Using Hbase: 0.98.1-cdh5.1.0
> >
> > Thanks a lot.
> >
> > Regards,
> > Shahab
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message