hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3290) Max Compaction Size
Date Thu, 02 Dec 2010 04:05:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965983#action_12965983
] 

HBase Review Board commented on HBASE-3290:
-------------------------------------------

Message from: "Nicolas" <nspiegelberg@facebook.com>


bq.  On 2010-12-01 10:49:59, stack wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 639
bq.  > <http://review.cloudera.org/r/1263/diff/1/?file=17923#file17923line639>
bq.  >
bq.  >     Is this right? We check all storefiles for references where before we only checked
the subset of candidate compaction files for references?
bq.  >     
bq.  >     
bq.  >     (Hmm.. maybe the old stuff was wrong?)

references == split files.  we currently don't support splitting split files (into quarter
pieces?), so we need to ensure no files are split.


bq.  On 2010-12-01 10:49:59, stack wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 926
bq.  > <http://review.cloudera.org/r/1263/diff/1/?file=17923#file17923line926>
bq.  >
bq.  >     I don't grok this comment

references == split files.  The current algorithm is to split a StoreFile, then immediately
use compaction after splitting to break them into 2 StoreFiles.  If you don't compact reference
files that are past the max threshold:

1) you won't be able to split the region again
2) you don't actually even know that the StoreFile is too large.  HalfStoreFileReader.length()
returns the whole StoreFile's length, not the length of the StoreFile related to your region


bq.  On 2010-12-01 10:49:59, stack wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 954
bq.  > <http://review.cloudera.org/r/1263/diff/1/?file=17923#file17923line954>
bq.  >
bq.  >     So, its ok to mess w/ file order?  We won't get ourselves into trouble if we
don't respect the order in which files were written?  We do a merge sort when we read all
compaction candidates in so should be fine I suppose -- since its same as how scanner merges
them...... 
bq.  >     
bq.  >     Just asking because in old days order was important but I suppose we let go
of that a while back?

so, technically, order is important for optimizations like the TimeStamp filter.  However,
realistically this isn't a problem because our normal skew always decreases in filesize over
time.  The only place where our skew doesn't decrease is for files that have been recently
flushed.  However, all those will be unconditionally compacted because they will be lower
than "hbase.hstore.compaction.min.size".  

The sorting is to handle an interesting issue that popped up for us during migration: we're
bulk loading StoreFiles of extremely variable size (are we migrating 1k users or 10M?) and
they will all appear at the end of the StoreFile list.  How do we determine when it is efficient
to compact them?  The easiest option was to sort the compact list and handle bulk files by
relative size instead of making some custom compaction selection algorithm just for bulk inclusion.
 It seems like any other companies that will incrementally migrate data into HBase would hit
the same issue.


bq.  On 2010-12-01 10:49:59, stack wrote:
bq.  > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 1024
bq.  > <http://review.cloudera.org/r/1263/diff/1/?file=17923#file17923line1024>
bq.  >
bq.  >     Is this a good name for this method?  We're compacting a Store, not Stores,
right?

true.  I mainly wanted to change the name from the public compact() api.  I kept annoyingly
clicking on the wrong function in Eclipse.  Do you want to refactor it to compactFiles() right
before commit?


- Nicolas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1263/#review2018
-----------------------------------------------------------





> Max Compaction Size
> -------------------
>
>                 Key: HBASE-3290
>                 URL: https://issues.apache.org/jira/browse/HBASE-3290
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>            Priority: Minor
>
> Add ability to specify a maximum storefile size for compaction.  After this limit, we
will not include this file in compactions.  This is useful for large object stores and clusters
that pre-split regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message