hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12859) Major compaction completion tracker
Date Mon, 19 Jan 2015 18:25:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282814#comment-14282814

Andrew Purtell commented on HBASE-12859:

I'll defer to Gary as to which option is more useful but will note the following tradeoff:
* ClusterStatus
** Pro: Only one RPC
** Con: Because this is full cluster state that RPC response can be *fat*, multiple megabytes
of response for a cluster of 100+ servers
** Con: Information is updated on the master upon RS heartbeat so may be slightly out of date
* GetRegionInfo:
** Pro: Up to date information from each RS
** Pro: Each RPC will have a small response
** Con: Must contact each RS. What happens when a region has relocated due to balancing or
RS failure? The current patch doesn't handle region relocation, it just throws up exceptions
to the caller, I suppose expecting the caller to retry until successful. Lots of unnecessary
work if we have 100 RS and queries to only 1 failed, or 1000 and calls to 5 failed, etc. 
** Con: Must contact each RS, 100 RPCs for a cluster of 100 RSes alone is a concern, what
about a cluster of 1000 RSes?

I'm not happy with either alternative. What about we start adding master APIs for getting
just the specific information from ClusterStatus we desire, without asking for all of the
ClusterStatus data? 

> Major compaction completion tracker
> -----------------------------------
>                 Key: HBASE-12859
>                 URL: https://issues.apache.org/jira/browse/HBASE-12859
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>         Attachments: 12859-v1.txt, 12859-v2.txt, 12859-v3.txt, 12859-wip-UNFINISHED.txt
> In various scenarios it is helpful to know a guaranteed timestamp up to which all data
in a table was major compacted.
> We can do that keeping a major compaction timestamp in META.
> A client then can iterate all region of a table and find a definite timestamp, which
is the oldest compaction timestamp of any of the regions.
> [~apurtell], [~ghelmling], [~giacomotaylor].

This message was sent by Atlassian JIRA

View raw message