mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Toenshoff" <toensh...@me.com>
Subject Re: Review Request 30774: Fetcher Cache
Date Thu, 26 Feb 2015 00:37:06 GMT


> On Feb. 25, 2015, 10:20 p.m., Till Toenshoff wrote:
> > src/slave/containerizer/fetcher.cpp, lines 459-460
> > <https://reviews.apache.org/r/30774/diff/18/?file=875547#file875547line459>
> >
> >     So a force removes from cache -- can you explain why this is needed?

Ow I got it :) ... a force is a "remove -> re/download -> cache -> extract / copy"


- Till


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review73462
-----------------------------------------------------------


On Feb. 25, 2015, 5:54 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2015, 5:54 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from CommandInfo::URIs
can now be cached in a cache directory designated by a slave flag. This only happens when
asked for by an extra flag in the URI and is thus backwards-compatible. The cache has a size
limit also given by a new slave flag. Cache-resident files are evicted as necessary to make
space for newly fetched ones. Concurrent attempts to cache the same URI leads to only one
download. The fetcher program remains external for safety reasons, but is now augmented with
more elaborate parameters packed into a JSON object to implement specific fetch actions for
all of the above. Additional testing includes fetching from (mock) HDFS and coverage of the
new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto facb87b92bf3194516f636dcc348e136af537721 
>   include/mesos/mesos.proto 507845c493f65e154214fc7e562206e452990469 
>   src/Makefile.am d372404d84c9ac0bc49da7407ad366701b9586a6 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 5b2d86d1867b25bf71461fd73df91b089002325b 
>   src/slave/constants.hpp 12d6e92b814076fd423a7738e030dc6ce6954761 
>   src/slave/containerizer/docker.hpp 70114dcef7a416ae904e95eb9ce365bcd866aebc 
>   src/slave/containerizer/docker.cpp 813ddf8c98622748b2da1b739bf122387abc4c79 
>   src/slave/containerizer/fetcher.hpp bfd98dbe16e2bd5df3e2c8e9b10e303654f33446 
>   src/slave/containerizer/fetcher.cpp 6e6bce08d76bb8a5813c905e3ffeff9b2411fd6d 
>   src/slave/containerizer/mesos/containerizer.hpp 074a2d82dcd882e52f8cd62ed7493295596acb26

>   src/slave/containerizer/mesos/containerizer.cpp d5b90d12d63becfeb4c3efa9c6f5d826417f582c

>   src/slave/flags.hpp ddb32593566b71bcf0b66adeeb889e43ef911084 
>   src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 
>   src/tests/docker_containerizer_tests.cpp 8b212d4e6ed5a179ebadce1bdbbf3edde87d706c 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp fcbf7ad912f0b1b3d106a75a9dcb8213a0c69c07 
>   src/tests/mesos.hpp 60c70043c8266a422ffa03ab5a949da0bc822124 
>   src/tests/mesos.cpp d76ed8c837da162d782de84e935fe8e11e01d6b0 
>   src/tests/mock_hadoop.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to stout: 30006,
30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 30618, 30621, 30626. See descriptions
of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more when the fetcher
uses JSON in a single env var as a parameter. They never tested anything that won't be covered
by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of calling the
run method of the fetcher directly, calling through fetch(). Also removes all uses of I/O
redirection, which is not really needed for debugging, and thus the next patch can refactor
fetch() and run(). (The latter comes in two varieties, which complicates matters without much
benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field that will
later cause fetcher cache actions. Also introduces the notion of a cache directory to the
fetcher info protobuf. And then propagates these additions throughout the rest of the code
base where applicable. This includes passing the slave ID all the way down to the place where
the cache dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly bypassing the
cache, fetch through the cache, retrieve from the cache). Switches the basis for dealing with
uris to "items", which contain the uri, the action, and potentially a cache file name. Refactors
fetch() and run(), so there is only one of each. Introduces about half of the actual cache
logic, including a hashmap of cache file objects for bookkeeping and basic operations on it.

> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of download results
in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not broken HDFS
fetching. Adds a test that exercises the C++ code paths in Mesos and mesos-fetcher related
to fetching from HDFS. Uses a mock HDFS client written in bash that acts just like a real
"hadoop" command if used in the right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. This implements
recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a test fixture
and tests to find out if the fetcher cache works with a variety of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that contains
the file size for the URI passed as argument. This is needed to determine the size of a file
on HDFS before downloading it to the fetcher cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type separation
code (distinguishes http, hdfs, local copying, etc.) from mesos-fetcher to the fetcher process/actor,
since it is going to be reused by download size queries when we introduce fetcher cache management.
Also factored out URI validation, which will be used the same way by mesos-fetcher and the
fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have enough space
to accomodate upcoming downloads to the cache. Necessary provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If it fails,
the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All math included
:-)
> - To find out how much space is needed, downloading has a prelude in which we query the
download size from the URI. This works for all URI types that mesos-fetcher currently supports,
including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in play happens.
> - There is cleanup code for all kinds of error situations. At the very end of the fetch
attempt, each list is processed for undoing things like space reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related cache files
are. We use reference counting for this, since there may be concurrent fetch attempts using
the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message