hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-8980) Assistant Store ----------- An Index Store of HRegion
Date Fri, 19 Jul 2013 06:00:53 GMT

     [ https://issues.apache.org/jira/browse/HBASE-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

chunhui shen updated HBASE-8980:
--------------------------------

    Description: 
*Background*
a.Generally, we would hope several organizations for the same data. e.g. Secondary Index sortes
the data as the non-primary key.
b.Now, when we scanning the data on HBase with condition, like ValueFilter, its  efficiency
seems low
c.We could create an Assistant Store to store the data with another organization for the data
of HRegion

*Assistant Store*
a.It's a store of HRegion, like HStore, could be created by user through adding ColumnFamliy

b.Data in Assistant Store is the copy of data in HRegion, but using another organization ,The
Exception is that its row could be not in the range of HRegion and its value is the same as
the row of original KeyValue
For example, 
The region(Range:'row001'~'row999') includes the following KVs in the Store cf:
row001/cf:q1/val001
row002/cf:q1/val002
row003/cf:q1/val003
we could create an Assistant Store(named as) for the region which includes the following KVs:
val001/cf:q1/row001
val002/cf:q1/row002
val003/cf:q1/row003

c.We could use local region transaction to ensure the Atomicity and Consistency

e.Regionserver will put data into Assistant Store automatically, but user should read the
data from Assistant Store himself


*Example of Using Assistant Store*
a.Supposing exist the empty table named t1 with the column family named c1, it has only one
region (region's range is from EMPTY_START_ROW to EMPTY_END_ROW).

b.Adding an Assistant Store for the table through adding a new column family named c2.

c.User put following data to table:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1
r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2

d.Then, the region will have the following data:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1
r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2

v1/c2:q1/r1
v1/c2:q1/r3
v1/c2:q1/r5
v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
v2/c2:q1/r4
v2/c2:q1/r6

e.Splitting the region into daughter_a  and daughter_b with the split poit 'r4', 

then the daughter_a has the following data:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1

v1/c2:q1/r1
v1/c2:q1/r3  (Data in Assistant Store)
v2/c2:q1/r2

the daughter_b has the following data:

r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2

v1/c2:q1/r5
v2/c2:q1/r4(Data in Assistant Store)
v2/c2:q1/r6


f.From the above, we could see that the data in Assistant Store is always corresponding to
the original data in Region, its data is maintained by regionserver.

g.How to use the data in Assistant Store? 
Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = 'v2',
We must scan the whole table without Assistant Store.
But now we could use Assistant Store to speed up scanning:
Take a scan on Assistant Store from 'v2' to 'v2+', and get the following result:
v2/c2:q1/r2
v2/c2:q1/r4
v2/c2:q1/r6

Unfortunately, the scan result may not be ordered by row nor value, but be able to make it
ordered by value.

>From the code view, I design the scan on Assistant Store as following:
{code}
//Limit the scan range from the row
Scan scan = new Scan();
scan.setStartRow('r1');
scan.setStopRow('r7');

//Do the scan on Assistant Store
Scan assistantScan = new Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00');
scan.setAssistantScan(assistantScan);//After setting this, region will run the scan with the
assistant Scan

scanner = htable.getScanner(scan);

for(Result result:scanner){
//out put
v2/c2:q1/r2
v2/c2:q1/r4
v2/c2:q1/r6
}
{code}


*Implementation Dependency*
a.Split the StoreFile as value.(Now,we just split the file as row)
b.Support multi-row transaction in region (Alreadt implemented)

Providing an initial patch on 0.94 version. 
What do you think about such a Store.

  was:
*Background*
a.Generally, we would hope several organizations for the same data. e.g. Secondary Index sortes
the data as the non-primary key.
b.Now, when we scanning the data on HBase with condition, like ValueFilter, its  efficiency
seems low
c.We could create an Assistant Store to store the data with another organization for the data
of HRegion

*Assistant Store*
a.It's a store of HRegion, like HStore, could be created by user through adding ColumnFamliy

b.Data in Assistant Store is the copy of data in HRegion, but using another organization ,The
Exception is that its row could be not in the range of HRegion and its value is the same as
the row of original KeyValue
For example, 
The region(Range:'row001'~'row999') includes the following KVs in the Store cf:
row001/cf:q1/val001
row002/cf:q1/val002
row003/cf:q1/val003
we could create an Assistant Store(named as) for the region which includes the following KVs:
val001/cf:q1/row001
val002/cf:q1/row002
val003/cf:q1/row003

c.We could use local region transaction to ensure the Atomicity and Consistency

e.Regionserver will put data into Assistant Store automatically, but user should read the
data from Assistant Store himself


*Example of Using Assistant Store*
a.Supposing exist the empty table named t1 with the column family named c1, it has only one
region (region's range is from EMPTY_START_ROW to EMPTY_END_ROW).

b.Adding an Assistant Store for the table through adding a new column family named c2.

c.User put following data to table:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1
r4/c1:q1/v2

d.Then, the region will have the following data:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1
r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2

v1/c2:q1/r1
v1/c2:q1/r3
v1/c2:q1/r5
v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
v2/c2:q1/r4
v2/c2:q1/r6

e.Splitting the region into daughter_a  and daughter_b with the split poit 'r4', 

then the daughter_a has the following data:
r1/c1:q1/v1
r2/c1:q1/v2
r3/c1:q1/v1

v1/c2:q1/r1
v1/c2:q1/r3  (Data in Assistant Store)
v2/c2:q1/r2

the daughter_b has the following data:

r4/c1:q1/v2
r5/c1:q1/v1
r6/c1:q1/v2

v1/c2:q1/r5
v2/c2:q1/r4(Data in Assistant Store)
v2/c2:q1/r6


f.From the above, we could see that the data in Assistant Store is always corresponding to
the original data in Region, its data is maintained by regionserver.

g.How to use the data in Assistant Store? 
Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = 'v2',
We must scan the whole table without Assistant Store.
But now we could use Assistant Store to speed up scanning:
Take a scan on Assistant Store from 'v2' to 'v2+', and get the following result:
v2/c2:q1/r2
v2/c2:q1/r4
v2/c2:q1/r6

Unfortunately, the scan result may not be ordered by row nor value, but be able to make it
ordered by value.

>From the code view, I design the scan on Assistant Store as following:
{code}
//Limit the scan range from the row
Scan scan = new Scan();
scan.setStartRow('r1');
scan.setStopRow('r7');

//Do the scan on Assistant Store
Scan assistantScan = new Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00');
scan.setAssistantScan(assistantScan);//After setting this, region will run the scan with the
assistant Scan

scanner = htable.getScanner(scan);

for(Result result:scanner){
//out put
v2/c2:q1/r2
v2/c2:q1/r4
v2/c2:q1/r6
}
{code}


*Implementation Dependency*
a.Split the StoreFile as value.(Now,we just split the file as row)
b.Support multi-row transaction in region (Alreadt implemented)

Providing an initial patch on 0.94 version. 
What do you think about such a Store.

    
> Assistant Store ----------- An Index Store of HRegion
> -----------------------------------------------------
>
>                 Key: HBASE-8980
>                 URL: https://issues.apache.org/jira/browse/HBASE-8980
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 8980-94.patch
>
>
> *Background*
> a.Generally, we would hope several organizations for the same data. e.g. Secondary Index
sortes the data as the non-primary key.
> b.Now, when we scanning the data on HBase with condition, like ValueFilter, its  efficiency
seems low
> c.We could create an Assistant Store to store the data with another organization for
the data of HRegion
> *Assistant Store*
> a.It's a store of HRegion, like HStore, could be created by user through adding ColumnFamliy
> b.Data in Assistant Store is the copy of data in HRegion, but using another organization
,The Exception is that its row could be not in the range of HRegion and its value is the same
as the row of original KeyValue
> For example, 
> The region(Range:'row001'~'row999') includes the following KVs in the Store cf:
> row001/cf:q1/val001
> row002/cf:q1/val002
> row003/cf:q1/val003
> we could create an Assistant Store(named as) for the region which includes the following
KVs:
> val001/cf:q1/row001
> val002/cf:q1/row002
> val003/cf:q1/row003
> c.We could use local region transaction to ensure the Atomicity and Consistency
> e.Regionserver will put data into Assistant Store automatically, but user should read
the data from Assistant Store himself
> *Example of Using Assistant Store*
> a.Supposing exist the empty table named t1 with the column family named c1, it has only
one region (region's range is from EMPTY_START_ROW to EMPTY_END_ROW).
> b.Adding an Assistant Store for the table through adding a new column family named c2.
> c.User put following data to table:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> d.Then, the region will have the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r1
> v1/c2:q1/r3
> v1/c2:q1/r5
> v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
> v2/c2:q1/r4
> v2/c2:q1/r6
> e.Splitting the region into daughter_a  and daughter_b with the split poit 'r4', 
> then the daughter_a has the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> v1/c2:q1/r1
> v1/c2:q1/r3  (Data in Assistant Store)
> v2/c2:q1/r2
> the daughter_b has the following data:
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r5
> v2/c2:q1/r4(Data in Assistant Store)
> v2/c2:q1/r6
> f.From the above, we could see that the data in Assistant Store is always corresponding
to the original data in Region, its data is maintained by regionserver.
> g.How to use the data in Assistant Store? 
> Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = 'v2',
> We must scan the whole table without Assistant Store.
> But now we could use Assistant Store to speed up scanning:
> Take a scan on Assistant Store from 'v2' to 'v2+', and get the following result:
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> Unfortunately, the scan result may not be ordered by row nor value, but be able to make
it ordered by value.
> From the code view, I design the scan on Assistant Store as following:
> {code}
> //Limit the scan range from the row
> Scan scan = new Scan();
> scan.setStartRow('r1');
> scan.setStopRow('r7');
> //Do the scan on Assistant Store
> Scan assistantScan = new Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00');
> scan.setAssistantScan(assistantScan);//After setting this, region will run the scan with
the assistant Scan
> scanner = htable.getScanner(scan);
> for(Result result:scanner){
> //out put
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> }
> {code}
> *Implementation Dependency*
> a.Split the StoreFile as value.(Now,we just split the file as row)
> b.Support multi-row transaction in region (Alreadt implemented)
> Providing an initial patch on 0.94 version. 
> What do you think about such a Store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message