accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Vines" <vi...@apache.org>
Subject Re: Review Request 27198: ACCUMULO-3236 introducing cloneInto feature
Date Tue, 23 Dec 2014 00:05:36 GMT


> On Dec. 22, 2014, 11:42 p.m., Christopher Tubbs wrote:
> > I'm curious about the expected behavior for when the source data contains deletes.
It seems that those deletes will be merged into the destination, clobbering any non-deleted
data in the destination. Clobbering (or at least, some sensible merge, based on timestamp)
is probably expected behavior for non-deleted data from the source, but deleted data is tricky,
because the user has no insight into the current state of deletes in the source. They can
"cloneInto" in one instant and have deletes clobber their destination, and then do it again
without any clobbering (because the source has since compacted).
> > 
> > To prevent this behavior, it seems you'd need to either 1) force a compaction, or
2) add metadata to the value of the file entry instructing the tserver to ignore deleted data
(delete markers and any data it is hiding) coming from that particular file, before merging
with the other files, similar to how we re-write timestamps when reading data from bulk-imported
files.
> > 
> > This unexpected behavior can occur even if all other parts of the cloneInto feature
were working perfectly. It gets even more complicated and unexpected when we begin considering
that the user's view of the data is different than what the underlying data actually is in
other ways, as the result of iterators. It may not be obvious that they are injecting the
underlying data, and not their view of it, into another table. How do you propose to address
that? With additional permissions to ensure not just everybody can perform this action? With
a check to verify that the same iterators exist on the destination table that existed on the
source table?
> 
> John Vines wrote:
>     I see your concerns no different as those for a standard bulk import. And these concerns
are currently handled by the user doing the bulk import needs to be aware of the data they're
calling the command on.
> 
> Christopher Tubbs wrote:
>     I agree that they aren't much different than bulk import. However, this is not a
"bulkImportFrom" feature, it is a "cloneInto" feature, and it's *very* different behavior
than clone. With bulk import, also, we have a special permission, that can be reserved to
those who understand its particulars. It's also the case that typical bulk import use cases
involve writing data that will not have deletes, or will not come from a source with a different
view than the underlying files.
>     
>     I'm just concerned about user expectations, and how we address them.
> 
> John Vines wrote:
>     What if we require the source table to be compacted down to 1 file per tablet to
ease concerns about deletes?
>     
>     And I totally agree with the permissions. Would you be okay with requiring bulk import
on the destination, or would you rather it's own permission?

Actually, compacting down to 1 file is expensive and may not be required if there are no deletes
in play. I think this is best left to user's of the system to quiesce to their needs.


- John


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27198/#review65843
-----------------------------------------------------------


On Dec. 22, 2014, 7:59 p.m., John Vines wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27198/
> -----------------------------------------------------------
> 
> (Updated Dec. 22, 2014, 7:59 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-3236
>     https://issues.apache.org/jira/browse/ACCUMULO-3236
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Includes all code to support feature, including thrift changes
> Includes minor code cleanup to TableLocator and items in the Bulk path to remove signature
items that are unused (arguments & exceptions)
> Includes renaming of some bulk import functions to clarify their purpose (because they're
now multi-purpose)
> 
> Patch is based on 1.6, but we can choose to make it target only 1.7 if we choose (this
conversation should be taken up on jira, not in RB)
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/client/admin/TableOperations.java 97f538d

>   core/src/main/java/org/apache/accumulo/core/client/impl/RootTabletLocator.java 97d476b

>   core/src/main/java/org/apache/accumulo/core/client/impl/TableOperationsImpl.java 07df1bd

>   core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocator.java e396d82

>   core/src/main/java/org/apache/accumulo/core/client/impl/TabletLocatorImpl.java c550f15

>   core/src/main/java/org/apache/accumulo/core/client/impl/TimeoutTabletLocator.java bcbe561

>   core/src/main/java/org/apache/accumulo/core/client/impl/thrift/TableOperation.java
7716823 
>   core/src/main/java/org/apache/accumulo/core/client/mock/MockTableOperationsImpl.java
de19137 
>   core/src/main/java/org/apache/accumulo/core/client/mock/impl/MockTabletLocator.java
35f160f 
>   core/src/main/java/org/apache/accumulo/core/master/thrift/FateOperation.java f65f552

>   core/src/main/java/org/apache/accumulo/core/tabletserver/thrift/TabletClientService.java
2ba7674 
>   core/src/main/thrift/client.thrift 38a8076 
>   core/src/main/thrift/master.thrift 38e9227 
>   core/src/main/thrift/tabletserver.thrift 25e0b10 
>   core/src/test/java/org/apache/accumulo/core/client/admin/TableOperationsHelperTest.java
1d91574 
>   core/src/test/java/org/apache/accumulo/core/client/impl/TableOperationsHelperTest.java
02838ed 
>   server/base/src/main/java/org/apache/accumulo/server/client/BulkImporter.java 4cc13a9

>   server/base/src/main/java/org/apache/accumulo/server/client/ClientServiceHandler.java
fe17a62 
>   server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java
258080c 
>   server/base/src/test/java/org/apache/accumulo/server/client/BulkImporterTest.java 3680341

>   server/master/src/main/java/org/apache/accumulo/master/FateServiceHandler.java 5818da3

>   server/master/src/main/java/org/apache/accumulo/master/tableOps/CloneIntoTable.java
PRE-CREATION 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/Tablet.java 9a07a4a 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 3f594cc

>   test/src/main/java/org/apache/accumulo/test/performance/thrift/NullTserver.java 0591b19

>   test/src/test/java/org/apache/accumulo/test/functional/CloneIntoIT.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/27198/diff/
> 
> 
> Testing
> -------
> 
> Includes CloneIntoIT, which exercises all permutations of the flags. Existing BulkIT
still functions as intended for validation of no feature loss in refactoring exiting code
for multi-purposing.
> 
> 
> Thanks,
> 
> John Vines
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message