spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andrewor14 <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK_6066] Make event log format easier to p...
Date Fri, 27 Feb 2015 23:38:28 GMT
GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/4821

    [SPARK_6066] Make event log format easier to parse

    The event log format before was incredibly difficult to parse:
    ```
    sparkVersion = 1.3.0
    compressionCodec = org.apache.spark.io.LZFCompressionCodec
    === LOG_HEADER_END ===
    // actual events, could be compressed bytes
    ```
    When compression is turned on, for instance, the metadata is not compressed while the
rest of the log is. Note that we can't compress the metadata because it contains the name
of the compression codec, which we need to even open the log in the first place.
    
    The new format puts the compression codec and the Spark version in the log file name instead.
It also represents the metadata in the first line of the event log as JSON, which is easy
for 3rd party applications to parse:
    ```
    {"Event": "SparkListenerMetadataIdentifier", "SPARK_VERSION":"1.3.0", "COMPRESSION_CODEC":"..."}
    // actual events. If compression is turned on the whole file, including the metadata,
is compressed.
    ```
    and the file name looks something like:
    ```
    EVENT_LOG_app_123_SPARK_VERSION_1.3.1
    EVENT_LOG_app_123_SPARK_VERSION_1.3.1_COMPRESSION_CODEC_{...}
    ```
    
    I tested this with and without compression, using different compression codecs and event
logging directories. I verified that both the `Master` and the `HistoryServer` can render
both compressed and uncompressed logs as before.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark event-log-format

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4821.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4821
    
----
commit 8db5a06d108d8a2ddb8460e48e3509f46cc4fc2f
Author: Andrew Or <andrew@databricks.com>
Date:   2015-02-27T23:29:26Z

    Embed metadata in the event log file name instead
    
    This makes the event logs much easier to parse than before.
    As of this commit the whole file is either entirely compressed
    or not compressed, but not somewhere in between.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message