hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Corgan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
Date Wed, 29 Aug 2012 19:40:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444349#comment-13444349

Matt Corgan commented on HBASE-2600:

{quote}\x00 could be part of a region key but the sort on table name first should make it
so the \x00 delimiter would be found first{quote}yep - in general, this is how i build compound
primary keys with variable length strings.  you shouldn't need any padding or anything.  the
only complication is if your string somehow contains \x00, but that can't happen in this case

As for moving the regionId to the qualifier, I don't really know enough about how it's used
to give detailed ideas, but some thoughts:
* there will not be many daughter regions at a given time, so we are not talking about wide
* perhaps putting the daughters into the same row adds some transactional benefits that we
didn't previously have?
* as for qualifier-prefix vs separate-qualifier, i actually don't know enough about usage
to say if neither/either/both would work.  seems like either could work given that each row
will be small enough to easily hold in memory and parse however.  i first proposed prefixing
to keep the KV sort order intact, but if that isn't required then separate-qualifier is cleaner.
> Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid
> ----------------------------------------------------------------------------------------------------
>                 Key: HBASE-2600
>                 URL: https://issues.apache.org/jira/browse/HBASE-2600
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Alex Newman
>         Attachments: 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch,
0001-HBASE-2600.v10.patch, 0001-HBASE-2600-v11.patch, 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch,
HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, hbase-2600-root.dir.tgz, jenkins.pdf
> This is an idea that Ryan and I have been kicking around on and off for a while now.
> If regionnames were made of tablename+endrow instead of tablename+startrow, then in the
metatables, doing a search for the region that contains the wanted row, we'd just have to
open a scanner using passed row and the first row found by the scan would be that of the region
we need (If offlined parent, we'd have to scan to the next row).
> If we redid the meta tables in this format, we'd be using an access that is natural to
hbase, a scan as opposed to the perverse, expensive getClosestRowBefore we currently have
that has to walk backward in meta finding a containing region.
> This issue is about changing the way we name regions.
> If we were using scans, prewarming client cache would be near costless (as opposed to
what we'll currently have to do which is first a getClosestRowBefore and then a scan from
the closestrowbefore forward).
> Converting to the new method, we'd have to run a migration on startup changing the content
in meta.
> Up to this, the randomid component of a region name has been the timestamp of region
creation.   HBASE-2531 "32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash
clashes" proposes changing the randomid so that it contains actual name of the directory in
the filesystem that hosts the region.  If we had this in place, I think it would help with
the migration to this new way of doing the meta because as is, the region name in fs is a
hash of regionname... changing the format of the regionname would mean we generate a different
hash... so we'd need hbase-2531 to be in place before we could do this change.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message