hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Re: A few issues we ran into the last couple of weeks.
Date Wed, 18 May 2011 18:12:04 GMT
Thanks Ted! Will do it right away.

1. we should provide the following new API where numOfRegions is the
expected number of regions to go online:

I used table.getRegionsInfo() to make sure all regions were online instead of this function.
But that function requires apriori knowledge of the number of regions.

V
P.S:  Copy-pasting my full name could be a little tedious!


On 5/18/11 11:02 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:

Vidhyashankar:
Please file the following JIRAs:
1. we should provide the following new API where numOfRegions is the
expected number of regions to go online:
    public boolean isTableAvailable(final byte[] tableName, int
numOfRegions) throws IOException {

2. HBaseAdmin.createTableAsync() should check whether there're duplicate
keys. Since it is a public method, we shouldn't solely reply on
createTable() to perform the check.

Thanks

On Wed, May 18, 2011 at 10:46 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> As in, the use of isTableAvailable there indicates, a bulk load should
> happen only if all the regions are available.
>
> But that may not be the case since the function returns back true if even
> one region (regionCount.get()>0 check) is online.
>
> V
>
>
> On 5/17/11 7:14 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>
> Did you mean that coming out of the following loop, the table might still
> be
> unavailable if there were many regions ?
>    while (!conn.isTableAvailable(table.getTableName()) &&
> (ctr<TABLE_CREATE_MAX_RETRIES)) {
>
> Cheers
>
> On Tue, May 17, 2011 at 7:10 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > >> Also some of the source for which we had used this function may be
> > broken (for example in LoadIncrementalHFiles.java)
> > Can you be more specific ?
> >
> > Thanks
> >
> >
> > On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman <
> > vidhyash@yahoo-inc.com> wrote:
> >
> >> >> For 1, the check in HCM.isTableAvailable() is:
> >> >>      return available.get() && (regionCount.get() > 0);
> >> >> This explains why some regions aren't available.
> >>
> >> The javadoc says the function returns true if all regions are available.
> >> Clearly this statement is wrong going by what is there in the code. Also
> >> some of the source for which we had used this function may be broken
> (for
> >> example in LoadIncrementalHFiles.java).
> >>
> >> >> For 3, can you provide a unit test so that we can investigate further
> ?
> >>
> >> The problem is I am unable to get the master crash consistently. I can
> >> send you the key split.
> >>
> >> Thank you
> >> Vidhya
> >>
> >> On 5/17/11 4:59 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >>
> >> For 1, the check in HCM.isTableAvailable() is:
> >>      return available.get() && (regionCount.get() > 0);
> >> This explains why some regions aren't available.
> >>
> >> For 3, can you provide a unit test so that we can investigate further ?
> >>
> >> Thanks
> >>
> >> On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman <
> >> vidhyash@yahoo-inc.com> wrote:
> >>
> >> > (Running Hbase 0.90.0 on 700+ nodes.)
> >> >
> >> > You may have seen many (or mostly all) of the following issues
> already:
> >> >   1. HConnection.isTableAvailable: This doesn't seem to be working all
> >> the
> >> > time. In particular, I had this code after creating a table
> >> asynchronously:
> >> >
> >> >   do {
> >> >      LOG.info("Table " + tableName + "not yet available... Sleeping
> for"
> >> +
> >> > sleepTime + "milliseconds...");
> >> >      Thread.sleep(sleepTime);
> >> >    } while (!conn.isTableAvailable(table.getTableName()));
> >> >    LOG.info("Table is available!! : "+tableName+" Available?
> >> > "+conn.isTableAvailable(table.getTableName()));
> >> >
> >> > It comes out of the loop but then I see this:
> >> > Table is available!! : <TABLE> Available? false
> >> >
> >> > And then I see that not all the regions are yet available.
> >> >
> >> >
> >> >   2. The master getting stuck unable to delete a WAL (I have seen this
> >> > before on this forum and a related JIRA on this one): We had worked
> >> around
> >> > by manually deleting a WAL. But during times when the master crashed
> >> during
> >> > table creation (with split key boundaries), the node that took over
> next
> >> as
> >> > the master (failover) started getting stuck for around 25% of the
> >> cluster. I
> >> > had to wipe out all the logs so that the master could start up right.
> >> >
> >> > But even then, the regionservers which had suffered the log issue
> >> couldn't
> >> > recognize the failed over master. (Is this something that has been
> >> observed
> >> > before?)
> >> >
> >> >
> >> >   3. createTableAsync with incorrect split keys: By mistake, I had
> some
> >> > duplicate keys in the split key byte array while calling the
> >> > createTableAsync function. The master crashed throwing a
> KeeperException
> >> > (thanks to the duplicate keys I guess?)
> >> >
> >> >
> >> > Also, can you let me know why createTableAsync blocks for some time
> and
> >> > throws a socket timeout exception when I try creating a table with a
> >> large
> >> > number of regions?
> >> >
> >> > Thank you
> >> > Vidhya
> >> >
> >>
> >>
> >
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message