hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Jaiprakash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.
Date Wed, 12 Jul 2017 10:33:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083774#comment-16083774
] 

Harish Jaiprakash commented on HIVE-17019:
------------------------------------------

Thanks [~sseth].

- Change the top level package from llap-debug to tez-debug? (Works with both I believe) [~ashutoshc],
[~thejas] - any recommendations on whether the code gets a top level module, or goes under
an existing module. This allows downloading of various debug artifacts for a tez job - logs,
metrics for llap, hiveserver2 logs (soon), tez am logs, ATS data for the query (hive and tez).

Will change the directory.

- In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) Will need to
exclude some dependent artifacts. See service/pom.xml llap-server dependency handling

The llap status is fetched using LlapStatusServiceDriver which is part of hive-llap-server.

- LogDownloadServlet - Should this throw an error as soon as the filename pattern validation
fails?

The filename check is to prevent any injection attack into the file name/http header, not
to validate the id.

- LogDownloadServlet - change to dagId/queryId validation instead

Can do, but it will be sensitive to changes to the id format. Currently its passed down to
ATS and nothing will be retrieved for it.

- LogDownloadServlet - thread being created inside of the request handler? This should be
limited outside of the request? so that only a controlled number of parallel artifact downloads
can run.

Creating a shared executor, does it make sense to use Guava's direct executor, which will
schedule task in current thread.

- LogDownloadServlet - what happens in case of aggregator failure? Exception back to the user?

Jetty will handle the exception, returning 500 to the user. Not sure if exception trace is
part of it. Will try and see.

- LogDownloadServlet - seems to be generating the file to disk and then streaming it over.
Can this be streamed over directly instead. Otherwise there's the possibility of leaking files.
(Artifact.downloadIntoStream or some such?) Guessing this is complicated further by the multi-threaded
artifact downloader.
Alternately need to have a cleanup mechanism.

For streaming directly, it would not be possible because of multithreading. If its single
threaded then I can use a ZipOutputStream and add entry one at a time.

Oops, sorry the finally got moved down since aggregator had to be closed before streaming
the file. I'll handle it using a try finally to cleanup.

- Timeout on the tests

Setting timeouts on tests.

- Apache header needs to be added to files where it is missing.

Sorry, will add the licence header to all files.

- Main - Please rename to something more indicative of what the tool does.

I was planning to remove this and integrate with hive cli, --service <download_logs>.
This does not work without lot of classpath fixes, or I'll have to create a script to add
hive jars.

- Main - Likely a follow up jira - parse using a standard library, instead of trying to parse
the arguments to main directly.

Will check a few libs, apache commons OptionBuilder uses a static instance in its builder.
Should be ok, for a cli based invoke once app, but will look at something better on lines
of python argparse.

- Server - Enabling the artifact should be controlled via a config. Does not always need to
be hosted in HS2 (Default disabled, at least till security can be sorted out)

I'll add a config.

- Is it possible to support a timeout on the downloads? (Can be a follow up jira)

Sure, will do. Global or per download or both?

- ArtifactAggregator - I believe this does 2 stages of dependent artifacts / downloads? Stage1
- download whatever it can. Information from this should should be adequate for stage2 downloads
?

It could be more stages:
Ex: given dag_id
stage 1: will fetch tez ats info which is used to extract hive id, task container/node list.
stage 2: will fetch hive ats info, tez container log list.
stage 3: llap containers log list, tez task logs.
stage 4: llap container logs.

aggregator iterates through the list of sources and finds those which can download using info
in the params.
It schedules the sources and waits for them to complete everything and the repeats.
Stop if no new sources could download or all sources are exhausted.


- For the ones not implemented yet (DummyArtifact) - think it's better to just comment out
the code, instead of invoking the DummyArtifacts downloader

Sorry, will do.

- Security - ACL enforcement required on secure clusters to make sure users can only download
what they have access to. This is a must fix before this can be enabled by default.

Working on this.

- Security - this can work around yarn restrictions on log downloads, since the files are
being accessed by the hive user.

Yes this should work.

Could you please add some details on cluster testing.

I'll add another comment with the details of testing.


> Add support to download debugging information as an archive.
> ------------------------------------------------------------
>
>                 Key: HIVE-17019
>                 URL: https://issues.apache.org/jira/browse/HIVE-17019
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Harish Jaiprakash
>            Assignee: Harish Jaiprakash
>         Attachments: HIVE-17019.01.patch
>
>
> Given a queryId or dagId, get all information related to it: like, tez am, task logs,
hive ats data, tez ats data, slider am status, etc. Package it into and archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message