hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lonikar, Kiran" <kloni...@informatica.com>
Subject RE: Hive over JDBC: Retrieving job Id and status
Date Sat, 30 May 2015 12:39:20 GMT
Hi Hari,

Thanks for your prompt replies. I went through JIRAs 7615 and 4629. It appears that the log
retrieval also happens over JDBC. It would be great if you could confirm this too and also
if it works even when running HS2 over http. The JIRAs talk of modification to thrift interface,
and not http.

If the log retrieval does not happen over JDBC, or in http HS2 mode, it will not work over
knox. I want a solution which works over knox.

If it doesn’t, I will go with my solution of giving a unique job name to my query and then
querying for the id using job client.

-Kiran

From: Hari Subramaniyan [mailto:hsubramaniyan@hortonworks.com]
Sent: Friday, May 29, 2015 5:14 PM
To: Lonikar, Kiran; user@hive.apache.org
Subject: Re: Hive over JDBC: Retrieving job Id and status


Hi Kiran,



For Async calls, see https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java#L83

The client in this case uses MiniHS2.getClient() which uses JDBC internally.  The method suggested
 involves log scraping, I am not sure if there is a direct API to retrieve the mapred job
ids associated with hive queries run via HiveServer2 other than parsing the logs.



Thanks

Hari



________________________________
From: Lonikar, Kiran <klonikar@informatica.com<mailto:klonikar@informatica.com>>
Sent: Friday, May 29, 2015 3:17 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Hive over JDBC: Retrieving job Id and status

Hi Hari,

I am using hive 0.13 and above. Thanks for the info. The example you provided uses cli and
does not seem like using JDBC. The JDBC calls are blocking (no API like executeStatementAsync).
Does the class org.apache.hive.service.cli.CLIServiceClient use JDBC internally?

Does it involve log scraping? I would prefer a programmatic interface.

-Kiran

From: Hari Subramaniyan [mailto:hsubramaniyan@hortonworks.com]
Sent: Friday, May 29, 2015 2:32 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive over JDBC: Retrieving job Id and status


Hi ​Kiran,



Which version of Hive are you using.



In 1.2 release, we have an option to set session level logging from client via hive.server2.logging.operation.level.
Setting this parameter to EXECUTION level  should provide map-red job information associated
with the query at the client side, which you should be able to retrieve in a parallel thread
as the query is running.  This idea is demonstrated in the following hive-unit test:

https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingAPIWithMr.java



More information about the related parameter can be found here :

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.level



For the above parameter to work, hiveserver2 should have logging enabled, i.e. hive.server2.logging.operation.enabled
should be set to true (default is true) when you start hiveserver2.



Thanks

Hari

________________________________
From: Lonikar, Kiran <klonikar@informatica.com<mailto:klonikar@informatica.com>>
Sent: Thursday, May 28, 2015 9:23 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Hive over JDBC: Retrieving job Id and status

Hi,

When a hive query is submitted to hiveserver2 over JDBC, is there a way to get the Hadoop
job id (and status) for the hive query?

The JDBC call “statement.execute(hiveQuery)” is a blocking call. Specifically, is there
any way to execute a query on the same JDBC connection to from another thread know the job
Id?

For now, I am following this approach: Before submitting the actual query, I execute the following
on the same statement:
set mapred.job.name=myjob.<pid>.<currentTime>

Here <pid> is the process id of the submitting java process and <currentTime>
is obtained using System.currentTimeMillis().

This sets the job name for the subsequent queries. I can then query the job Id for this job
name using the JobClient and then I can monitor the job status using this job Id.

Let me know if there is a better way to proceed.

-Kiran

Mime
View raw message