hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathalie Blais <>
Subject RE: Hiveserver2 crash with RStudio (using RJDBC)
Date Mon, 06 Oct 2014 17:49:17 GMT
Hello Vaibhav,

Sorry for the delay in getting back to you on this.  We now have an up and running test cluster
with a hive server I can “crash at will”.  I have been able to reproduce the crash on
this new server by following the steps mentioned below; I will now try to grab a heap dump.

In the meantime, I have observed that Hiverserver crashes * after * the Map/Reduce job has
completed successfully.  Something in the “gymnastic” of returning the rows to RStudio
through RJDBC makes it crash.  Such a crash does not happen on other JDBC clients; I have
tried out several: SQuirreL, SQL Workbench/J, Aqua Data Studio, etc.  They all work fine with
hiveserver2 through JDBC.

Again, thank you very for your patience and for your collaboration; I’ll return shortly
with the heap dump.

Best regards,

-- Nathalie

From: Nathalie Blais
Sent: 25 septembre 2014 13:57
To: ''
Subject: RE: Hiveserver2 crash with RStudio (using RJDBC)

Hello Vaibhav,

Thanks a lot for your quick response!

I will grab a heapdump as soon as I have “the ok to crash the server” and attach it to
this thread.  In the meantime, regarding our metastore, it looks like it is remote (excerpt
from our hive-site.xml below):


On a side note, the forum might have received my inquiry several times.  I had a bit of trouble
sending it and I retried a few times; please disregard any dupes of this request.


-- Nathalie

From: Vaibhav Gumashta []
Sent: 25 septembre 2014 03:52
Subject: Re: Hiveserver2 crash with RStudio (using RJDBC)


Can you grab a heapdump at the time the server crashes (export this to your environment: HADOOP_CLIENT_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=<give-your-path-here> $HADOOP_CLIENT_OPTS".)? What type of metastore
are you using with HiveServer2 - embedded (if you specify -hiveconf hive.metastore.uris="
" in the HiveServer2 startup command, it uses embedded metastore) or remote?


On Mon, Sep 22, 2014 at 10:55 AM, Nathalie Blais <<>>

We are currently experiencing a severe reproducible hiveserver2 crash when using the RJDBC
connector in RStudio (please refer to the description below for the detailed test case). 
We have a hard time pinpointing the source of the problem and we are wondering whether this
is a known issue or we have a glitch in our configuration; we would sincerely appreciate your
input on this case.

Severe Hiveserver2 crash when returning “a certain” volume of data (really not that big)
to RStudio through RJDBC

Config Versions
Hadoop Distribution: Cloudera – cdh5.0.1p0.47
Hiverserver2: 0.12
RStudio: 0.98.1056
RJDBC: 0.2-4

How to Reproduce

1.       In a SQL client application (Aqua Data Studio was used for the purpose of this example),
create Hive test table

a.       create table test_table_connection_crash(col1 string);

2.       Load data into table (data file attached)

a.       LOAD DATA INPATH '/user/test/testFile.txt' INTO TABLE test_table_connection_crash;

3.       Verify row count

a.       select count(*) nbRows from test_table_connection_crash;

b.      720 000 rows

4.       Display all rows

a.       select * from test_table_connection_crash order by col1 desc

b.      All the rows are returned by the Map/Reduce to the client and displayed properly in
the interface

5.       Open RStudio

6.       Create connection to Hive

a.       library(RJDBC)

b.      drv <- JDBC(driverClass="org.apache.hive.jdbc.HiveDriver", classPath=list.files("D:/myJavaDriversFolderFromClusterInstall/",
pattern="jar$", full.names=T), identifier.quote="`")

c.       conn <- dbConnect(drv, "jdbc:hive2://server_name:10000/default;ssl=true;sslTrustStore=C:/Progra~1/Java/jdk1.7.0_60/jre/lib/security/cacerts;trustStorePassword=pswd",
"user", "password")

7.       Verify connection with a small query

a.       r <- dbGetQuery(conn, "select * from test_table_connection_crash order by col1
desc limit 100")

b.      print(r)

c.       100 rows are returned to RStudio and properly displayed in the console interface

8.       Remove the limit and try the original query (as performed in the SQL client application)

a.       r <- dbGetQuery(conn, "select * from test_table_connection_crash order by col1

b.      Query starts running

c.       *** Cluster crash ***

Worst comes to worst, in the eventuality that RStudio desktop client cannot handle such an
amount of data, we might expect the desktop application to crash; not the whole hiveserver2.

Please let us know whether or not you are aware of any issues of the kind.  Also, please do
not hesitate to request any configuration file you might need to examine.

Thank you very much!

Best regards,



Nathalie Blais
B.I. Developer | Technology Group
Ubisoft Montreal

NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.
View raw message