hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lv <lvzheng19800...@gmail.com>
Subject Re: Cannot open filename Exceptions
Date Wed, 24 Mar 2010 09:43:50 GMT
Hello Stack,
  Thank you for your explainations, it's very helpful, Thank you.
  If I get something new, I'll connect you.
  Regards,
    LvZheng

2010/3/24 Stack <stack@duboce.net>

> On Tue, Mar 23, 2010 at 8:42 PM, Zheng Lv <lvzheng19800619@gmail.com>
> wrote:
> > Hello Stack,
> >  >So, for sure ugly stuff is going on.  I filed
> >  >https://issues.apache.org/jira/browse/HBASE-2365.  It looks like we're
> >  >doubly assigning a region.
> >  Can you tell me how this happened in detail? Thanks a lot.
> >
>
> Yes.
>
> Splits are run by the regionserver.  It figures a region needs to be
> split and goes ahead closing the parent and creating the daughter
> regions.  It then adds edits to the meta table offlining the parent
> and inserting the two new daughter regions.  Next it sends a message
> to the master telling it that a region has been split.   The message
> contains names of the daughter regions.  On receipt of the message,
> the master adds the new daughter regions to the unassigned regions
> list so they'll be passed out the next time a regionserver checks in.
>
> Concurrently, the master is running a scan of the meta table every
> minute making sure all is in order.  One thing it does is if it finds
> unassigned regions, it'll add them to the unassigned regions (this
> process is what gets all regions assigned after a startup).
>
> In your case, whats happening is that there is a long period between
> the add of the new split regions to the meta table and the report of
> split to the master.  During this time, the master meta scan ran,
> found one of the daughters and went and assigned it.  Then the split
> message came in and the daughter was assigned again!
>
> There was supposed to be protection against this happening IIRC.
> Looking at responsible code, we are trying to defend against this
> happening in ServerManager:
>
>  /*
>   * Assign new daughter-of-a-split UNLESS its already been assigned.
>   * It could have been assigned already in rare case where there was a
> large
>   * gap between insertion of the daughter region into .META. by the
>   * splitting regionserver and receipt of the split message in master (See
>   * HBASE-1784).
>   * @param hri Region to assign.
>   */
>  private void assignSplitDaughter(final HRegionInfo hri) {
>    MetaRegion mr =
> this.master.regionManager.getFirstMetaRegionForRegion(hri);
>    Get g = new Get(hri.getRegionName());
>    g.addFamily(HConstants.CATALOG_FAMILY);
>    try {
>      HRegionInterface server =
>        master.connection.getHRegionConnection(mr.getServer());
>      Result r = server.get(mr.getRegionName(), g);
>      // If size > 3 -- presume regioninfo, startcode and server -- then
> presume
>      // that this daughter already assigned and return.
>      if (r.size() >= 3) return;
>    } catch (IOException e) {
>      LOG.warn("Failed get on " + HConstants.CATALOG_FAMILY_STR +
>        "; possible double-assignment?", e);
>    }
>    this.master.regionManager.setUnassigned(hri, false);
>  }
>
> So, the above is not working in your case for some reason.   I'll take
> a look but I'm not sure I can figure it w/o DEBUG (thanks for letting
> me know about the out-of-sync clocks... Now I can have more faith in
> what the logs are telling me).
>
> >
> >  >With DEBUG enabled have you been able to reproduce?
> >  These days the exception did not appera again, if it would, I'll show
> you
> > the logs.
> >
>
> For sure, if you come across it again, I'm interested.
>
> Thanks Zheng,
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message