hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Wong <sw...@netflix.com>
Subject RE: Tables not accessible after restarting Amazon EC2 instances
Date Wed, 12 Oct 2011 23:59:38 GMT
Why not change the old hostnames in the metadata? You can't do that via Hive DDL, you'd have
to do that to the metadata store directly.

It'll be interesting to know if that fixes the rest of the problem.


From: Agarwal, Ravindra (ASG) [mailto:Ravindra_Agarwal@syntelinc.com]
Sent: Tuesday, October 11, 2011 7:35 AM
To: user@hive.apache.org
Subject: Tables not accessible after restarting Amazon EC2 instances

Setup: I have setup a Hadoop cluster of 3 machines on Amazon EC2. Hive is running on top of
it.

Problem: Tables created on Hive are not accessible after Amazon EC2 instances are restarted
(in other words - after the hosts in the cluster get renamed).

Details:

A table is created on the above Hive setup and loaded with data. After that, Amazon EC2 instances
are stopped at the end of the day. Next day, when the instances are restarted and the table
queried (something like select * from mytable), the query processor tries to connect and retrieve
data from the HDFS nodes using old host names (please note Amazon instances acquire new hostname
and IP address when the instance is restarted). Since the hostname has now changed, it fails
to connect and throws an error.

I change the masters and slaves file with new hostnames, edit mapred-site.xml and core-site.xml
for the new hostnames and then run start-all.sh script to start all the Hadoop processes.
It seems Hive stores the old hostname somewhere in the table metadata due to which it tries
to read from that old hostname and throwing error when that hostname is not found.

Kindly let me know if there is any solution to this problem or any Hive patch available to
fix it.

Regards,
Ravi


Confidential: This electronic message and all contents contain information from Syntel, Inc.
which may be privileged, confidential or otherwise protected from disclosure. The information
is intended to be for the addressee only. If you are not the addressee, any disclosure, copy,
distribution or use of the contents of this message is prohibited. If you have received this
electronic message in error, please notify the sender immediately and destroy the original
message and all copies.Confidential: This electronic message and all contents contain information
from Syntel, Inc. which may be privileged, confidential or otherwise protected from disclosure.
The information is intended to be for the addressee only. If you are not the addressee, any
disclosure, copy, distribution or use of the contents of this message is prohibited. If you
have received this electronic message in error, please notify the sender immediately and destroy
the original message and all copies.

Mime
View raw message