hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1249) Rearchitecting of server, client, API, key format, etc for 0.20
Date Wed, 18 Mar 2009 06:39:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682935#action_12682935
] 

Jonathan Gray commented on HBASE-1249:
--------------------------------------

We need to do some testing on that.  Scanning through the deletes in the memcache might be
pretty fast, regardless.  However I think it sounds like a good idea and the basis for some
more thoughts.

And yeah, there should probably be no such thing as a DeleteRow on the server.  And this is
especially the case with locality groups as you'd need to seek to the start of the row every
time before seeking down to your family.

But in thinking more about memcache deletes... when we flush the memcache, we can guarantee
that none of the values being flushed have been deleted (if we do as above, applying deletes
to the memcache).  So we have a list of deletes that apply to older store files.  Then we
start a new memcache.

When we read in the newest storefile, we actually know that we can process it without looking
at any deletes except those that are in the new memcache.  The deletes in this storefile aren't
needed until the second newest is looked at.  And at that point we can read them in in bulk
from the previous storefile that's already been opened.  Can even compare stamps from the
deletes to the storefile stamps to possible query stamps to early out.  This is a far cry
from how things are now... deletes are interspersed and duplicated everywhere.

It does seem to make sense to have the deletes order above where they apply, but then we have
to check those sections first before reading?  Well come to think of it, what could make sense
is to order them below.  The only time we actually have deletes in a storefile is when they
need to be applied to the older storefiles.  So, we can scan these deletes at the end, once
we have reached past what we wanted (and still need to read additional storefiles) we can
scan and seek for deletes pertaining to this row/family/column, if there are any.

Those deletes are added to the in-memory deleteset for the remaining storefiles.

Any rewriting of files must enforce deletions across them, and files must be sequential in
age if not all are combined.

So, DeleteRow and DeleteFamily would take no time parameters, and would be stored with the
time of deletion.  Their KeyValue will sort at the end of the row, meaning you need to scan
to this spot any time you reach the end of what you're reading from that store's row and need
to read the next.

DeleteColumn would use now by default, or you could specify a stamp and it would delete everything
<= that stamp.  This _could_ sort at the end of the column, but is there any point?  It
should probably be at the end of the row, this is where you have to seek to look for a DeleteFamily
anyways.

Delete would be the same thing.  Sorted at the end of the row.  Just need to get the deleteset
and comparators right so they can do the matching well for these different delete types against
different cell KeyValues.

Might make sense to have a DeleteRow in this case, would be less work in the case of locality
groups.  But not a big deal either way really.

> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
>                 Key: HBASE-1249
>                 URL: https://issues.apache.org/jira/browse/HBASE-1249
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> To discuss all the new and potential issues coming out of the change in key format (HBASE-1234):
zero-copy reads, client binary protocol, update of API (HBASE-880), server optimizations,
etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message