hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王锋 <wfeng1...@163.com>
Subject Re:Re: Re: Re: Re:Re: hiveserver usage
Date Mon, 12 Dec 2011 09:52:50 GMT
hiveserver will throw oom after several hours .

At 2011-12-12 17:39:21,"alo alt" <wget.null@googlemail.com> wrote:
what happen when you set xmx=2048m or similar? Did that have any negative effects for running
queries?


2011/12/12 王锋 <wfeng1982@163.com>

I have modify hive jvm args.
 the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .


but the memory  used by hiveserver  is still large.


 




At 2011-12-12 16:20:54,"Aaron Sun" <aaron.sun82@gmail.com> wrote:

Not from the running jobs, what I am saying is the heap size of the Hadoop really depends
on the number of files, directories on the HDFS. Remove old files periodically or merge small
files would bring in some performance boost.


On the Hive end, the memory consumed also depends on the queries that are executed. Monitor
the reducers of the Hadoop job, and my experiences are that reduce part could be the bottleneck
here.


It's totally okay to host multiple Hive servers on one machine. 


2011/12/12 王锋 <wfeng1982@163.com>

is the files you said  the files from runned jobs  of our system? and them  can't be so much
large.


why is the cause of namenode.  what are hiveserver doing   when it use so large memory?


how  do you use hive? our method using hiveserver is correct?

Thanks.


在 2011-12-12 14:27:09,"Aaron Sun" <aaron.sun82@gmail.com> 写道:

Not sure if this is because of the number of files, since the namenode would track each of
the file and directory, and blocks. 
See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/


Please correct me if I am wrong, because this seems to be more like a hdfs problem which is
actually irrelevant to Hive.


Thanks
Aaron


2011/12/11 王锋 <wfeng1982@163.com>


I want to know why the hiveserver use so large memory,and where the memory has been used ?


在 2011-12-12 10:02:44,"王锋" <wfeng1982@163.com> 写道:




The namenode summary:




the mr summary



and hiveserver:




hiveserver jvm args:
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15
-XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParall
elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


now we  using 3 hiveservers in the same machine.




在 2011-12-12 09:54:29,"Aaron Sun" <aaron.sun82@gmail.com> 写道:
how's the data look like? and what's the size of the cluster?


2011/12/11 王锋 <wfeng1982@163.com>

Hi,


    I'm one of engieer of sina.com.  We have used hive ,hiveserver several months. We have
our own tasks schedule system .The system can schedule tasks running with hiveserver by jdbc.


    But The hiveserver use mem very large, usally  large than 10g.   we have 5min tasks which
will be  running every 5 minutes.,and have hourly tasks .total num of tasks  is 40. And we
start 3 hiveserver in one linux server,and be cycle connected .


    so why Memory of  hiveserver  using so large and how we do or some suggestion from you
?


Thanks and Best Regards!


Royce Wang





























--

Alexander Lorenz
http://mapredit.blogspot.com


P Think of the environment: please don't print this email unless you really need to.



Mime
View raw message