hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: Re: Re: Re: Re: Re:Re: hiveserver usage
Date Mon, 12 Dec 2011 10:48:19 GMT
You can identify threads with "top -H", the catch one process (pid)
and use jstack:
jstack PID

Its quite not possible I think to filter for a task (If I wrong please
correct me). Here you need a long running task.

- Alex

2011/12/12 王锋 <wfeng1982@163.com>:
>
> how about watch one hive job's stacks .Can it be watched by jobId?
>
> use  ps -Lf hiveserverPId| wc -l  ,
> the threads num of one hiveserver has 132 theads.
> [root@d048049 logs]# ps -Lf 15511|wc -l
> 132
> [root@d048049 logs]#
>
> every stack size is 10m the mem will be 1320M,1g.
>
> so hive's lowest  mem  is 1g?
>
> 在 2011-12-12 17:59:52,"alo alt" <wget.null@googlemail.com> 写道:
>>When you start a high-load hive query can you watch the stack-traces?
>>Its possible over the webinterface:
>>http://jobtracker:50030/stacks
>>
>>- Alex
>>
>>
>>2011/12/12 王锋 <wfeng1982@163.com>
>>>
>>> hiveserver will throw oom after several hours .
>>>
>>>
>>> At 2011-12-12 17:39:21,"alo alt" <wget.null@googlemail.com> wrote:
>>>
>>> what happen when you set xmx=2048m or similar? Did that have any negative effects
for running queries?
>>>
>>> 2011/12/12 王锋 <wfeng1982@163.com>
>>>>
>>>> I have modify hive jvm args.
>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>
>>>> but the memory  used by hiveserver  is still large.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.sun82@gmail.com> wrote:
>>>>
>>>> Not from the running jobs, what I am saying is the heap size of the Hadoop
really depends on the number of files, directories on the HDFS. Remove old files periodically
or merge small files would bring in some performance boost.
>>>>
>>>> On the Hive end, the memory consumed also depends on the queries that are
executed. Monitor the reducers of the Hadoop job, and my experiences are that reduce part
could be the bottleneck here.
>>>>
>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>
>>>> 2011/12/12 王锋 <wfeng1982@163.com>
>>>>>
>>>>> is the files you said  the files from runned jobs  of our system? and
them  can't be so much large.
>>>>>
>>>>> why is the cause of namenode.  what are hiveserver doing   when it use
so large memory?
>>>>>
>>>>> how  do you use hive? our method using hiveserver is correct?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.sun82@gmail.com> 写道:
>>>>>
>>>>> Not sure if this is because of the number of files, since the namenode
would track each of the file and directory, and blocks.
>>>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>
>>>>> Please correct me if I am wrong, because this seems to be more like a
hdfs problem which is actually irrelevant to Hive.
>>>>>
>>>>> Thanks
>>>>> Aaron
>>>>>
>>>>> 2011/12/11 王锋 <wfeng1982@163.com>
>>>>>>
>>>>>>
>>>>>> I want to know why the hiveserver use so large memory,and where the
memory has been used ?
>>>>>>
>>>>>> 在 2011-12-12 10:02:44,"王锋" <wfeng1982@163.com> 写道:
>>>>>>
>>>>>>
>>>>>> The namenode summary:
>>>>>>
>>>>>>
>>>>>>
>>>>>> the mr summary
>>>>>>
>>>>>>
>>>>>> and hiveserver:
>>>>>>
>>>>>>
>>>>>> hiveserver jvm args:
>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m -XX:MaxHeapFreeRatio=40
-XX:MinHeapFreeRatio=15 -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParall
>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
>>>>>>
>>>>>> now we  using 3 hiveservers in the same machine.
>>>>>>
>>>>>>
>>>>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.sun82@gmail.com>
写道:
>>>>>>
>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>
>>>>>> 2011/12/11 王锋 <wfeng1982@163.com>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver
several months. We have our own tasks schedule system .The system can schedule tasks running
with hiveserver by jdbc.
>>>>>>>
>>>>>>>     But The hiveserver use mem very large, usally  large than
10g.   we have 5min tasks which will be  running every 5 minutes.,and have hourly tasks .total
num of tasks  is 40. And we start 3 hiveserver in one linux server,and be cycle connected
.
>>>>>>>
>>>>>>>     so why Memory of  hiveserver  using so large and how we do
or some suggestion from you ?
>>>>>>>
>>>>>>> Thanks and Best Regards!
>>>>>>>
>>>>>>> Royce Wang
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Alexander Lorenz
>>> http://mapredit.blogspot.com
>>>
>>> P Think of the environment: please don't print this email unless you really need
to.
>>>
>>>
>>>
>>>
>>
>>
>>
>>--
>>Alexander Lorenz
>>http://mapredit.blogspot.com
>>
>>P Think of the environment: please don't print this email unless you
>>really need to.
>
>
>



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Mime
View raw message