accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-571) MergeClone/BulkImport from existing table
Date Wed, 10 Apr 2013 19:39:17 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628160#comment-13628160
] 

Keith Turner commented on ACCUMULO-571:
---------------------------------------

If you want to merge the data in tableA into tableB, then I think you can take the split points
in tableA and create them in tableB.  Then copy the file pointers from tableA to the corresponding
tablets in tableB.  Also, take the max logical time.   The reason that the split points in
tableA need to be added to tableB, is so that files in tableA with stale data outside of a
tablets range does not come back.   

A merge on table B would need to prevented while this operation takes place.  A table write
lock could be acquired to prevent this.

                
> MergeClone/BulkImport from existing table
> -----------------------------------------
>
>                 Key: ACCUMULO-571
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-571
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client, tserver
>            Reporter: John Vines
>            Assignee: John Vines
>
> This is idea that was recently brought to my attention. The use case is a user wants
to essentially clone a subset of a table into an existing table. Currently cloning does not
allow this. Current option is to copy the files in hdfs and then bulk import, since bulk import
moves the files. This is pretty wasteful. Under the hood, the system can handle the cross-linking
between files like that. We just need a mechanism to provide the ability to assign a subset
of data to another region.
> Potential uses include the above mentioned, as well as the potential for users to bring
fresh data into a table which was cloned and modified. There may be other cases, but I haven't
fully thought out this problem space.
> The biggest problem with this is it does put the onus on the user for ensuring that data
in the in memory maps is flushed before moving, as well as for handling the possibility of
duplicate data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message