hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-15779) S3guard: add inconsistency detection metrics
Date Fri, 21 Sep 2018 00:20:00 GMT
Aaron Fabbri created HADOOP-15779:

             Summary: S3guard: add inconsistency detection metrics
                 Key: HADOOP-15779
                 URL: https://issues.apache.org/jira/browse/HADOOP-15779
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.2.0
            Reporter: Aaron Fabbri

S3Guard uses a trailing log of metadata changes made to an S3 bucket to add consistency to
the eventually-consistent AWS S3 service. We should add some metrics that are incremented
when we detect inconsistency (eventual consistency) in S3.

I'm thinking at least two counters: (1) getFileStatus() (HEAD) inconsistency detected, and
(2) listing inconsistency detected. We may want to further separate into categories (present
/ not present etc.)

This is related to Auth. Mode and TTL work that is ongoing, so let me outline how I think
this should all evolve:

This should happen after HADOOP-15621 (TTL for dynamo MetadataStore), since that will change
*when* we query both S3 and the MetadataStore (e.g. Dynamo) for metadata. There I suggest
 # Prune time is different than TTL. Prune time: "how long until inconsistency is no longer
a problem" . TTL time "how long a MetadataStore entry is considered authoritative before refresh"
 # Prune expired: delete entries (when hadoop CLI prune command is run). TTL Expired: entries
become non-authoritative.
 #  Prune implemented in each MetadataStore, but TTL filtering happens in S3A.

Once we have this, S3A will be consulting both S3 and MetadataStore depending on configuration
and/or age of the entry in the MetadataStore. Today HEAD/getFileStatus() is always short-circuit
(skips S3 if MetadataStore returns results). I think S3A should consult both when TTL is stale,
and invoke a callback on inconsistency that increments the new metrics. For listing, we already
are comparing both sources of truth (except when S3A auth mode is on and a directory is marked
authoritative in MS), so it would be pretty simple to invoke a callback on inconsistency and
bump some metrics.

Comments / suggestions / questions welcomed.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message