hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14971) Merge S3A committers into trunk
Date Fri, 20 Oct 2017 21:53:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213302#comment-16213302

ASF GitHub Bot commented on HADOOP-14971:

GitHub user steveloughran opened a pull request:


    HADOOP-14971 Merge S3A committers into trunk

    HADOOP 13786 & MAPREDUCE-6823 code as a PR for better review

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/hadoop s3guard/HADOOP-13786-committer

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #282
commit 70e2a84547936cdfa65c58a2482c498eabbce889
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-06T17:18:15Z

    HADOOP-13786: apply the HADOOP-13796-on-branch-2 patch to trunk, whitspace fix

commit 738b0c045603182b38d1ce08d97f60393043f565
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-06T18:28:37Z

    HADOOP-13786 fixing docs to avoid doxia bug on level 4 entries

commit 6d99b815eb33ccc0d0514e6330eb255f77d29372
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-12T19:57:02Z

    HADOOP-13786 HADOOP-14303 error handling: l-exp wrappers around core metadata ops

commit d9f72547212f7bc5c47b4aab949581e5e0d448ee
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-12T19:57:31Z

    HADOOP-13786 TestStagingCommitter  -> java 8 closures

commit 09249b354c9c0043d2690cd4d3fbb124e09eb2d8
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-13T15:57:33Z

    HADOOP-13786 HADOOP-14531 lambda wrapper around all production s3 calls
    * all invocations of s3 calls are wrapped where appropriate, either with once() (which
does the translation), retry() or retryUntranslated
    * javadocs state retry policy; this is propagated to give callers an idea of what retries
    * commit tests -> java 8 lambdas too
    * test json serdeser in hadoop common
    * checkstyle

commit 5814f22aeab22d5a3bacb27bb456d665530c3d94
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-15T10:30:57Z

    HADOOP-13786 HADOOP-14531
    * new @Retries annotation for the s3a classes to use to make their retry policy more visible
in the source. This is a source-only annotation unused anywhere, but does make visible policy.
You can't call a non-retrying method and be retrying yourself unless you add your own retry
    * fault injecting AWSS3 client better about knowing when is good to fail (i.e not so aggressively
on listing operations)
    * callback interface for before/during retries unified
    * and logging cut back so only first failure gets logged on a retry loop. Maybe that could
be tuned to remember the previous failure & log if its different class
    * all integration tests excluding rename() ones are now working when tested with a high
(25-50% throttle rate).
    * DDB logs of capacity limit failures

commit 2bd385361bda4ddd1590dfed7c3377bee1ffa739
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-15T10:31:19Z

    HADOOP-13786 turn off false alarm in findbugs

commit e8039d3d7734b607c0d0e093ea6d573672490753
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-19T10:38:32Z

    HADOOP-13786 MAPREDUCE-6823 FileOutputFormat uses the committer factory, with tests

commit 1e61b94490fbf3f75330ceea3b5d3b863f5efbe6
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-19T13:47:44Z

    * s/DefaultPutTracker/r/PutTracker. Yes, it is the default one, but its misleading as
a type.
    * move to l-expressions in block output stream callables & the committers. Exception:
Tasks.runParallel() whose closure is complex enough that the IDE was warning about its size.
Maybe best to refactor as a method invoked as this::exec
    * Adding new statistic, {{committer_bytes_uploaded}}, set when a stream is closed to #of
bytes PUT.
    * S3A FS implements {{StreamCapabilities}}, dynamically declares if it is magic by returning
true on {{hasCapability("fs.s3a.magic.enabled")}} when it is.
    * S3ABlockOutputStream implements {{StreamCapabilities}}; dynamically declares if its
output has delayed visibility. Also: that it doesn't do hsync/hflush, obviously.
    * {{CommitOperations}}: Experimented with replacing {{MaybeIOE}} with Java 8 Optional<>
type. Doesn't work as {{maybeThrow}} can't be implemented as {{Optional<IOException>.map((e)
-> {throw e;})}}; java's checked exceptions makes maps fairly useless for the Hadoo IOE-throwing
APIs. OUutcome:  {{MaybeIOE}} unchanged.
    * Minor cleanup of production & test code
    * starting to write end user documentation. Needs more clarity on directory vs partitioned
output on staging committer, including examples

commit 798e0a3e2ed9ad0185ca003151489ff18acdacfb
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-21T10:30:38Z

    HADOOP-13786 MAPREDUCE-6823 adding public getOutputPath to PathOutputCommitter API, as
some callers currenly scan the JobConf settings to find this value

commit 91611c32e19ab3fb59ebc1c99b8d3855c50de56b
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-21T10:38:18Z

    HADOOP-13786 altering s3a committer code to track MAPREDUCE-6823,

commit 91bc628638f65dab3b5f8bdad3e89bcc0c874af0
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-22T10:46:42Z

    HADOOP-13786 HADOOP-14531: 443 response goes to NoResponseException, treat as retryable
for non-idempotent calls only

commit d0d36abc95b4108f1c2e7fb3825a4353b47351ec
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-22T18:49:00Z

    HADOOP-13786 downgrade startup log about magic from info to debug. s3guard bucket-info
should show its status though. Also, move another anon class to a l-exp

commit 77f9fb212d1d83868b85d5689f3cd7ecd7165eec
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-26T14:55:37Z

    HADOOP-13786 HADOOP-14531 DDB throttling events are logged as a quantile/rate metric (Hz)
rather than just total count.

commit c98b1421ca131406a2059f2a6659d86377eaf971
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-26T19:10:00Z

    HADOOP-13786 javadocs of Retries

commit 78f85138a521a800d99ec2a257ad5cf1c8e6e445
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-27T16:08:25Z

    HADOOP-13786 HADOOP-14531 rework retry logic, including Ewan's feedback. New names, les
logging. Also, exceptions are translated before the event handler is called, even if the operation
is untranslated. This means the event handler doesn't need to worry about whether the incoming
event is raw/vs translated

commit 51d4d519efde7412ea12df930c69e54c3a5432e0
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-27T18:27:55Z

    HADOOP-13786 checkstyle and bucket-info gains a "-magic" command to verify that magic
support is turned on

commit 48566c512b6faa7cadc4b7f5b8709ca01a9a9c03
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-28T18:25:14Z

    HADOOP-13786 MAPREDUCE-6823 more test on the commit factories

commit 272e32a0e42d3c798e1def10197997f8ebb5b342
Author: Steve Loughran <stevel@apache.org>
Date:   2017-09-28T18:26:23Z

    HADOOP-13786 more on commit algorithms themselves, turning docs and commit/abort code
to match

commit 34aee058cdd8d691fd61592e353e3b717b145a94
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-03T15:00:54Z

    HADOOP-13786 paste in code from how the MR AM creates a committer, to verify that it works
without spinning up the whole cluster
    Change-Id: I6841877fde593d6dffa1ba6065a2dc7564ab3329
    (cherry picked from commit 3634f5a20c76c0b28c4f9b4f7e39af4db5fc8c68)

commit d1b072c4faea798106378416479f5916b7d3d325
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-03T17:03:59Z

    HADOOP-13786 MAPREDUCE-6823  improving commentary on committer factory; clean up tests
    Change-Id: Ie468a243b23e389122b1e1c7281f76671d567167

commit d775a149b45377b31e612212df8485f9aa564f2a
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-03T18:25:35Z

    HADOO-13786 setting up for testing of partitioning merge strategies. I understand what
it is trying to do now
    Change-Id: Ia1e4834e5793a9a768e4f373b7dafb39e195af4e

commit f2e0701b81c180e93464d5734e20a3e65509aedb
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-04T19:38:45Z

    HADOOP-13786 partitioned committer work (+some java 8 bits)
    * move lambda map/flatmap/apply ops on located file status iterator into S3AUtils from
TestUtils, use in staging committer & commit operations;
    * document what partitioned committer does, with notes (needs verification)
    * testing of Paths.addUUID() and fix failures
    Change-Id: I7329a45668f272162d836a2bbbf2cf3e71c56e56

commit 40204f1169a515f10f9a0d0c9283b27efb8c2653
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-04T19:40:28Z

    HADOOP-13786 revert back to java-7 logic in CommitOperations: cute but
    overcomplex here.
    Change-Id: I6f5a176e360cc6071a0f35cbb324f50fb335b233

commit fa2860c7505d6ae1ac8360b3998aa0034ecce448
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-09T17:14:57Z

    HADOOP-13786 MAPREDUCE-6823 remove createCommitter(JobContext) as the only place a FileOutputCommitter
is created off a job context is in the code bridging from the v1 to v2 APIs of FileOutputFormat.
The new factory model doesn't support v1 MR, so it's not needed. This simplifies testing and
allows for code cutbacks in the s3a implementations & downstream.
    Change-Id: Ifb51c1465a359f7f2cdafb16fe6e21dd143cadbf

commit 8f696e74d0d1d4eb0c3737c21705bc61f06087e8
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-09T17:19:13Z

    HADOOP-13786 S3A committers don't need to support a JobContext in the constructors or
factories: remove, clean up tests. Where tests do need to create a Committer with nothing
but a JobConf, use the same code which MR itself does for this, now statically exported from
    Change-Id: I79ab5acd9e4c15f4c1b9b520cf18258a97b7dbdc

commit 4cfa70bb1479fb7e938597b5ff0f278ee22fd9f3
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-09T18:46:05Z

    HADOOP-13786: Success marker: Should we delete this when a job starts?
    Yes: its presence marks the completion of a job
    No: if it contains metadata, that data may be valid until the new data is present
    Change-Id: I359cb943745f6b7b58667f7462bfcb7c0b0313e7

commit ac091be5eb2e975fd89b250f6900fadf4e84351e
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-10T18:23:59Z

    HADOOP-13786 MAPREDUCE-6823 There's now a "BindingPathOutputCommitter" which can  be instantiated
and which relays its invocations to the factory. This is useful to work with code which takes
a committer classname to know what to instantiated -it allows you to delegate to the factory
for dynamic binding on a per-destination basis.
    Change-Id: I0472c60df98a54e5272b221c650c2a09e3d46fa1

commit c74a599bc89db43b9df7b76478e54d6d5666cb11
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-10T18:25:32Z

    HADOOP-13786 MAPREDUCE-6823 static method to combine createing factory & committer
in one go; turns out to be a useful operation downstream, so merits simplification. Tests
    Change-Id: Ie5173141132ba41bad5af9f97fd67056428e7f2b

commit 84fab155a549c53ee46760ec390c87b8a54b13f4
Author: Steve Loughran <stevel@apache.org>
Date:   2017-10-12T20:28:37Z

    * WriteOperationHelper no longer takes a key in its constructor, caller must supply on
the relevant ops
    * _SUCCESS file includes a name field which is validated on load; goal is to identify
other formats/versions and reject.
    * big code review of tests, including renaming, cleanup, IDE-suggested cleanup
    * tests also verify that the hasCapabilities() field returns true for the magic option
on a magic write, false for a non-magic one, even on a magic FS.
    Change-Id: Ia2de777e98c73819d44c2b755fb57be4be5e4a34


> Merge S3A committers into trunk
> -------------------------------
>                 Key: HADOOP-14971
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14971
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a github
PR for review there & to keep it out the mailboxes of the watchers on the main JIRA

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message