hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Yuan Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18166) [AMv2] We are splitting already-split files
Date Tue, 06 Jun 2017 06:11:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038258#comment-16038258

Stephen Yuan Jiang commented on HBASE-18166:

[~stack], when I implemented the SplitTableRegionProcedure, I copied the logic from SplitTransactionImpl.java:
   * Creates reference files for top and bottom half of the
   * @param hstoreFilesToSplit map of store files to create half file references for.
   * @return the number of reference files that were created.
   * @throws IOException
  private Pair<Integer, Integer> splitStoreFiles(
      final Map<byte[], List<StoreFile>> hstoreFilesToSplit)
      throws IOException {
    if (hstoreFilesToSplit == null) {
      // Could be null because close didn't succeed -- for now consider it fatal
      throw new IOException("Close returned empty list of StoreFiles");
    // The following code sets up a thread pool executor with as many slots as
    // there's files to split. It then fires up everything, waits for
    // completion and finally checks for any exception
    int nbFiles = 0;
    for (Map.Entry<byte[], List<StoreFile>> entry: hstoreFilesToSplit.entrySet())
        nbFiles += entry.getValue().size();  ===> possible to have reference files 

I just wonder whether we should change the logic in SplitTransactionImpl in branch-1 to skip
splitting reference files (I checked HRegion#doClose() and did not see the logic to skip reference
files in region server side).

> [AMv2] We are splitting already-split files
> -------------------------------------------
>                 Key: HBASE-18166
>                 URL: https://issues.apache.org/jira/browse/HBASE-18166
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: stack
>            Assignee: stack
>             Fix For: 2.0.0
>         Attachments: HBASE-18166.master.001.patch, HBASE-18166.master.002.patch
> Interesting issue. The below adds a lag cleaning up files after a compaction in case
of on-going Scanners (for read replicas/offheap).
> HBASE-14970 Backport HBASE-13082 and its sub-jira to branch-1 - recommit (Ram)
> What the lag means is that now that split is run from the HMaster in master branch, when
it goes to get a listing of the files to split, it can pick up files that are for archiving
but that have not been archived yet.  When it does, it goes ahead and splits them... making
references of references.
> Its a mess.
> I added asking the Region if it is splittable a while back. The Master calls this from
SplitTableRegionProcedure during preparation. If the RegionServer asked for the split, it
is sort of redundant work given the RS asks itself if any references still; if any, it'll
wait before asking for a split. But if a user/client asks, then this isSplittable over RPC
comes in handy.
> I was thinking that isSplittable could return list of files.... 
> Or, easier, given we know a region is Splittable by the time we go to split the files,
then I think master-side we can just skip any references found presuming read-for-archive.
> Will be back with a patch. Want to test on cluster first (Side-effect is regions are
offline because file at end of the reference to a reference is removed ... and so the open

This message was sent by Atlassian JIRA

View raw message