hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: A few issues we ran into the last couple of weeks.
Date Wed, 18 May 2011 18:21:09 GMT
Vidhyashankar:
table.getRegionsInfo() is for advanced users (such as you) :-)
Anyway, we shouldn't enforce user to call it.

On Wed, May 18, 2011 at 11:12 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Thanks Ted! Will do it right away.
>
> 1. we should provide the following new API where numOfRegions is the
> expected number of regions to go online:
>
> I used table.getRegionsInfo() to make sure all regions were online instead
> of this function. But that function requires apriori knowledge of the number
> of regions.
>
> V
> P.S:  Copy-pasting my full name could be a little tedious!
>
>
> On 5/18/11 11:02 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>
> Vidhyashankar:
> Please file the following JIRAs:
> 1. we should provide the following new API where numOfRegions is the
> expected number of regions to go online:
>    public boolean isTableAvailable(final byte[] tableName, int
> numOfRegions) throws IOException {
>
> 2. HBaseAdmin.createTableAsync() should check whether there're duplicate
> keys. Since it is a public method, we shouldn't solely reply on
> createTable() to perform the check.
>
> Thanks
>
> On Wed, May 18, 2011 at 10:46 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
>
> > As in, the use of isTableAvailable there indicates, a bulk load should
> > happen only if all the regions are available.
> >
> > But that may not be the case since the function returns back true if even
> > one region (regionCount.get()>0 check) is online.
> >
> > V
> >
> >
> > On 5/17/11 7:14 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >
> > Did you mean that coming out of the following loop, the table might still
> > be
> > unavailable if there were many regions ?
> >    while (!conn.isTableAvailable(table.getTableName()) &&
> > (ctr<TABLE_CREATE_MAX_RETRIES)) {
> >
> > Cheers
> >
> > On Tue, May 17, 2011 at 7:10 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > >> Also some of the source for which we had used this function may be
> > > broken (for example in LoadIncrementalHFiles.java)
> > > Can you be more specific ?
> > >
> > > Thanks
> > >
> > >
> > > On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman <
> > > vidhyash@yahoo-inc.com> wrote:
> > >
> > >> >> For 1, the check in HCM.isTableAvailable() is:
> > >> >>      return available.get() && (regionCount.get() >
0);
> > >> >> This explains why some regions aren't available.
> > >>
> > >> The javadoc says the function returns true if all regions are
> available.
> > >> Clearly this statement is wrong going by what is there in the code.
> Also
> > >> some of the source for which we had used this function may be broken
> > (for
> > >> example in LoadIncrementalHFiles.java).
> > >>
> > >> >> For 3, can you provide a unit test so that we can investigate
> further
> > ?
> > >>
> > >> The problem is I am unable to get the master crash consistently. I can
> > >> send you the key split.
> > >>
> > >> Thank you
> > >> Vidhya
> > >>
> > >> On 5/17/11 4:59 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> > >>
> > >> For 1, the check in HCM.isTableAvailable() is:
> > >>      return available.get() && (regionCount.get() > 0);
> > >> This explains why some regions aren't available.
> > >>
> > >> For 3, can you provide a unit test so that we can investigate further
> ?
> > >>
> > >> Thanks
> > >>
> > >> On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman <
> > >> vidhyash@yahoo-inc.com> wrote:
> > >>
> > >> > (Running Hbase 0.90.0 on 700+ nodes.)
> > >> >
> > >> > You may have seen many (or mostly all) of the following issues
> > already:
> > >> >   1. HConnection.isTableAvailable: This doesn't seem to be working
> all
> > >> the
> > >> > time. In particular, I had this code after creating a table
> > >> asynchronously:
> > >> >
> > >> >   do {
> > >> >      LOG.info("Table " + tableName + "not yet available... Sleeping
> > for"
> > >> +
> > >> > sleepTime + "milliseconds...");
> > >> >      Thread.sleep(sleepTime);
> > >> >    } while (!conn.isTableAvailable(table.getTableName()));
> > >> >    LOG.info("Table is available!! : "+tableName+" Available?
> > >> > "+conn.isTableAvailable(table.getTableName()));
> > >> >
> > >> > It comes out of the loop but then I see this:
> > >> > Table is available!! : <TABLE> Available? false
> > >> >
> > >> > And then I see that not all the regions are yet available.
> > >> >
> > >> >
> > >> >   2. The master getting stuck unable to delete a WAL (I have seen
> this
> > >> > before on this forum and a related JIRA on this one): We had worked
> > >> around
> > >> > by manually deleting a WAL. But during times when the master crashed
> > >> during
> > >> > table creation (with split key boundaries), the node that took over
> > next
> > >> as
> > >> > the master (failover) started getting stuck for around 25% of the
> > >> cluster. I
> > >> > had to wipe out all the logs so that the master could start up
> right.
> > >> >
> > >> > But even then, the regionservers which had suffered the log issue
> > >> couldn't
> > >> > recognize the failed over master. (Is this something that has been
> > >> observed
> > >> > before?)
> > >> >
> > >> >
> > >> >   3. createTableAsync with incorrect split keys: By mistake, I had
> > some
> > >> > duplicate keys in the split key byte array while calling the
> > >> > createTableAsync function. The master crashed throwing a
> > KeeperException
> > >> > (thanks to the duplicate keys I guess?)
> > >> >
> > >> >
> > >> > Also, can you let me know why createTableAsync blocks for some time
> > and
> > >> > throws a socket timeout exception when I try creating a table with
a
> > >> large
> > >> > number of regions?
> > >> >
> > >> > Thank you
> > >> > Vidhya
> > >> >
> > >>
> > >>
> > >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message