hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12583) Allow creating reference files even the split row not lies in the storefile range if required
Date Wed, 26 Nov 2014 08:55:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225920#comment-14225920
] 

Anoop Sam John commented on HBASE-12583:
----------------------------------------

The RKs in the actual region (and file may be like). 
a,b,c,d,e and split on c
Now the index region (and files) will have data <colvalue>a, <colvalue>b, <colvalue>c,
<colvalue>d, <colvalue>e  etc
We want to split the index region also at c.  so index child regions also will be like  a-c
and c-e
But as u see the rks is not exactly this way.  We have col value parts also.

For half store we have CP level hooks and thing works fine with the new IndexHalfStoreFileReader.
Only thing is the range check stop the split to split the files into 2 ref files.

I agree this is ugly to ask for turn off this range check for some tables. (index table)

> Allow creating reference files even the split row not lies in the storefile range if
required
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-12583
>                 URL: https://issues.apache.org/jira/browse/HBASE-12583
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: rajeshbabu
>            Assignee: rajeshbabu
>              Labels: Phoenix
>             Fix For: 2.0.0, 0.98.9, 0.99.2
>
>
> Currently in HRegionFileSystem#splitStoreFile we are not creating reference files if
the split row not lies in the storefile range that means one of the child region doesn't have
any data.
> {code}
>    // Check whether the split row lies in the range of the store file
>     // If it is outside the range, return directly.
>     if (top) {
>       //check if larger than last key.
>       KeyValue splitKey = KeyValueUtil.createFirstOnRow(splitRow);
>       byte[] lastKey = f.createReader().getLastKey();
>       // If lastKey is null means storefile is empty.
>       if (lastKey == null) return null;
>       if (f.getReader().getComparator().compareFlatKey(splitKey.getBuffer(),
>           splitKey.getKeyOffset(), splitKey.getKeyLength(), lastKey, 0, lastKey.length)
> 0) {
>         return null;
>       }
>     } else {
>       //check if smaller than first key
>       KeyValue splitKey = KeyValueUtil.createLastOnRow(splitRow);
>       byte[] firstKey = f.createReader().getFirstKey();
>       // If firstKey is null means storefile is empty.
>       if (firstKey == null) return null;
>       if (f.getReader().getComparator().compareFlatKey(splitKey.getBuffer(),
>           splitKey.getKeyOffset(), splitKey.getKeyLength(), firstKey, 0, firstKey.length)
< 0) {
>         return null;
>       }
>     }
> {code}
> In some cases when split row should be compared with part of rowkey(in composite rowkey)
mainly for secondary index tables we need to create reference files even when split row not
lies in the storefile range so that they can be rewritten to it's child regions by some custom
half store file reader which compare the part of row key with split row.
> The check of comparing split row with storefile range and returning directly can be avoided
by having special boolean attribute in table descriptor when it set to true. Or else we can
have coprocessor hooks so that in the hooks we can create the references and bypass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message