hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MIS (JIRA)" <>
Subject [jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
Date Fri, 18 Mar 2011 05:10:29 GMT


MIS commented on HIVE-2051:

Yes it is necessary for the executor to be terminated if the jobs have been submitted to it,
even though submitted jobs may have been completed. 

However, what we need not do here is, after the executor is shutdown, await till the termination
gets over, since this is redundant. As all the submitted jobs to the executor will be completed
by the time we shutdown the executor. This is what is ensured when we do result.get()
i.e., the following piece of code is not required.
+      do {
+        try {
+          executor.awaitTermination(Integer.MAX_VALUE, TimeUnit.SECONDS);
+          executorDone = true;
+        } catch (InterruptedException e) {
+        }
+      } while (!executorDone);

> getInputSummary() to call FileSystem.getContentSummary() in parallel
> --------------------------------------------------------------------
>                 Key: HIVE-2051
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch
> getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely
slow when the number of input paths are huge. By calling those functions in parallel, we can
cut latency in most cases.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message