accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From keith-turner <...@git.apache.org>
Subject [GitHub] accumulo issue #224: ACCUMULO-4500 ACCUMULO-96 Added summarization
Date Mon, 06 Mar 2017 16:50:19 GMT
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/224
  
    > @keith-turner we talked about this yesterday, but I wanted to post it here. What
would happen if a file is deleted, like maybe compacted and gc'd, after the file list is grabbed?
    
    @mjwall I had not thought of this case and currently have no handling for it.  Yet another
win for code reviews.
    
    I think the best solution to this problem is to introduce a new inaccuracy counter called
`deleted`.  There are already a few inaccuracy counters reported when gather summary information.
 I will add another comment that shows where these can be found.
    
    At first I thought I could circle back and use the file that replaced a missing file.
 However this approach has a problem.  Multiple deleted files could have been compacted into
the replacement file, and for some of those deleted files we may have already gathered and
merged summary information.  Trying to avoid this problem would make gathering summaries more
expensive.  In order to keep gathering summaries fast, I think it would be best to just report
the problem.  If someone really wants to avoid this problem, they can clone the table and
make the request against the clone.  I can put this avoidance strategy in the javadoc for
`deleted`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message