hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharathkrishna Guruvayoor Murali (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-19525) Spark task logs print PLAN PATH excessive number of times
Date Sat, 02 Jun 2018 16:46:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-19525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bharathkrishna Guruvayoor Murali updated HIVE-19525:
----------------------------------------------------
    Attachment: HIVE-19525.2.patch

> Spark task logs print PLAN PATH excessive number of times
> ---------------------------------------------------------
>
>                 Key: HIVE-19525
>                 URL: https://issues.apache.org/jira/browse/HIVE-19525
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Bharathkrishna Guruvayoor Murali
>            Priority: Major
>         Attachments: HIVE-19525.1.patch, HIVE-19525.2.patch
>
>
> A ton of logs with this {{Utilities - PLAN PATH = hdfs://localhost:59527/.../apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/6ebceb49-7a76-4159-9082-5bba44391e30/hive_2018-05-14_07-28-44_672_8205774950452575544-1/-mr-10006/bf14c0b5-a014-4ee8-8ddf-fdb7453eb0f0/map.xml}}
> Seems it print multiple times per task exception, not sure where it is coming from, but
its too verbose. It should be changed to DEBUG level. Furthermore, given that we are using
{{Utilities#getBaseWork}} anytime we need to access a {{MapWork}} or {{ReduceWork}} object,
we should make the method slightly more efficient. Right now it borrows a {{Kryo}} from a
pool and does a bunch of stuff to set the classloader, then it checks the cache to see if
the work object has already been created. It should check the cache before doing any of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message