Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37F08172D7 for ; Mon, 20 Apr 2015 18:42:02 +0000 (UTC) Received: (qmail 42249 invoked by uid 500); 20 Apr 2015 18:42:00 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 42167 invoked by uid 500); 20 Apr 2015 18:42:00 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 42084 invoked by uid 99); 20 Apr 2015 18:42:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Apr 2015 18:42:00 +0000 Date: Mon, 20 Apr 2015 18:42:00 +0000 (UTC) From: "Eugene Koifman (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-10404) hive.exec.parallel=true causes "out of sequence response" and SocketTimeoutException: Read timed out MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Eugene Koifman created HIVE-10404: ------------------------------------- Summary: hive.exec.parallel=true causes "out of sequence response" and SocketTimeoutException: Read timed out Key: HIVE-10404 URL: https://issues.apache.org/jira/browse/HIVE-10404 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Eugene Koifman With hive.exec.parallel=true, Driver.lauchTask() calls Task.initialize() from 1 thread on several Tasks. It then starts new threads to run those tasks. Taks.initiazlie() gets an instance of Hive and holds on to it. Hive.java internally uses ThreadLocal to hand out instances, but since Task.initialize() is called by a single thread from the Driver multiple tasks share an instance of Hive. Each Hive instances has a single instance of MetaStoreClient; the later is not thread safe. With hive.exec.parallel=true, different threads actually execute the tasks, different threads end up sharing the same MetaStoreClient. If you make 2 concurrent calls, for example Hive.getTable(String), the Thrift responses may return to the wrong caller. Thus the first caller gets "out of sequence response", drops this message and reconnects. If the timing is right, it will consume the other's response, but the the other caller will block for hive.metastore.client.socket.timeout since its response message has now been lost. This is just one concrete example. One possible fix is to make Task.db use ThreadLocal. This could be related to HIVE-6893 -- This message was sent by Atlassian JIRA (v6.3.4#6332)