hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6942) Endpoint implementation for bulk deletion of data
Date Wed, 24 Oct 2012 03:36:12 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu updated HBASE-6942:
--------------------------

    Release Note: 
This issue gives an example Endpoint implementation for efficiently deleting bulk data from
tables. Which  data to be deleted can be controlled using a Scan object passed to the endpoint.
We can delete rows, column families, column qualifiers or cell versions based on delete type
passed.
Optionally timestamp also can be passed. When timestamp is passed for ROW, FAMILY and COLUMN
delete types, all the versions before that time(specified time inclusive) will get deleted.
When the type is VERSION, if a timestamp is passed, only one version(with ts as given value)
of all the cells which the Scan selected will be deleted. When no timestamp value passed for
VERSION type delete it will delete all the cell versions which the Scan selected. Using appropriate
Scan with Timerange etc user can control which versions to be deleted.
The API returns the number of rows deleted (In types other than ROW it is not entire row deleted)
and when type is VERSION it will return total number of versions deleted also.
The Scan can be created with a rowkey range, with some filters, with Timerange etc based on
the delete usecase.



  was:
This issue gives an Endpoint implementation for efficiently deleting bulk data from tables.Which
all data to be deleted can be controlled using a Scan passed to the endpoint.
We can delete rows, column families, column qualifiers or cell versions based on delete type
passed.
Optionally timestamp also can be passed. When timestamp is passed for delete types ROW, FAMILY
and COLUMN all the versions before that time(specified time inclusive) will get deleted.
When the type is VERSION, if a timestamp is passed, only one version(with ts as given value)
of all the cells which the Scan selected will be getting deleted. When no timestamp value
passed for VERSION type delete it will delete all the cell versions which the Scan selected.
Using appropriate Scan with Timerange etc user can control which all versions to be deleted.
The API returns the number of rows deleted (In types other than ROW it is not entire row deleted)
and when type is VERSION it will return total number of versions deleted also.
The Scan can be created with a rowkey range, with some filters, with Timerange etc based on
the delete usecase.



    
> Endpoint implementation for bulk deletion of data
> -------------------------------------------------
>
>                 Key: HBASE-6942
>                 URL: https://issues.apache.org/jira/browse/HBASE-6942
>             Project: HBase
>          Issue Type: Improvement
>          Components: Coprocessors, Performance
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: HBASE-6942_94-V8.patch, HBASE-6942_DeleteTemplate.patch, HBASE-6942.patch,
HBASE-6942_Trunk.patch, HBASE-6942_Trunk-V2.patch, HBASE-6942_V2.patch, HBASE-6942_V3.patch,
HBASE-6942_V4.patch, HBASE-6942_V5.patch, HBASE-6942_V6.patch, HBASE-6942_V7.patch
>
>
> We can provide an end point implementation for doing a bulk deletion of data(based on
a scan) at the server side. This can reduce the time taken for such an operation as right
now it need to do a scan to client and issue delete(s) using rowkeys.
> Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message