hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王锋 <wfeng1...@163.com>
Subject Re:Re:Re: Re: Re: Re: Re: Re:Re: hiveserver usage
Date Tue, 13 Dec 2011 06:25:56 GMT
I got the question of hive large memory.


before the jvm args: 
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms2000m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15
-XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit
-XX:MaxTenuringThreshold=8 -XX:PermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps"


the para -XX:NewRatio=1 did not work ,and the young generation size is default 1g. eden space
sieze is 800m.
so everytime tasks come, part of  the new objects will be store in to the old generation.
thougn the ygc work,but fullgc didn't work .so hivesever heap size is very large.
I don't know why did ' -XX:NewRatio '  not work. if you know ,pls tell me.


And I modify the config:
export HADOOP_OPTS="$HADOOP_OPTS  -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m -Xss128k  -XX:MaxHeapFreeRatio=80
-XX:MinHeapFreeRatio=40 -XX:+UseParNewGC -XX:+UseConcMarkSw
eepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0
-XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P
ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



 -Xmn4000m can be sure the new generation is large enough ,so each ygc can be clean the data.



在 2011-12-12 19:20:35,"王锋" <wfeng1982@163.com> 写道:


yes,we using jdk 1. 0.26


[hdfs@d048049 conf]$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)


I will see the document of  the url,thanks very much!



在 2011-12-12 19:08:37,"alo alt" <wget.null@googlemail.com> 写道:
>Argh, increase! sry, to fast typing
>
>2011/12/12 alo alt <wget.null@googlemail.com>:
>> Did you update your JDK in last time? A java-dev told me that could be
>> a  issue in JDK _26
>> (https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some
>> devs report a memory decrease when they use GC - flags. I'm quite not
>> sure, sounds for me to far away.
>>
>> The stacks have a lot waitings, but I see nothing special.
>>
>> - Alex
>>
>> 2011/12/12 王锋 <wfeng1982@163.com>:
>>>
>>> The hive log:
>>>
>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt
>>> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]
>>> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,
>>> real=0.08 secs]
>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt
>>> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]
>>> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,
>>> real=0.07 secs]
>>> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt
>>>
>>> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem
>>> still be so large .I'm  mad, God
>>>
>>> have other suggestions ?
>>>
>>> 在 2011-12-12 17:59:52,"alo alt" <wget.null@googlemail.com
>>>> 写道:
>>>>When you start a high-load hive query can you watch the stack-traces?
>>>>Its possible over the webinterface:
>>>>http://jobtracker:50030/stacks
>>>>
>>>>- Alex
>>>>
>>>>
>>>>2011/12/12 王锋 <wfeng1982@163.com>
>>>>>
>>>>> hiveserver will throw oom after several hours .
>>>>>
>>>>>
>>>>> At 2011-12-12 17:39:21,"alo alt" <wget.null@googlemail.com> wrote:
>>>>>
>>>>> what happen when you set xmx=2048m or similar? Did that have any negative
effects for running queries?
>>>>>
>>>>> 2011/12/12 王锋 <wfeng1982@163.com>
>>>>>>
>>>>>> I have modify hive jvm args.
>>>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .
>>>>>>
>>>>>> but the memory  used by hiveserver  is still large.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.sun82@gmail.com>
wrote:
>>>>>>
>>>>>> Not from the running jobs, what I am saying is the heap size of the
Hadoop really depends on the number of files, directories on the HDFS. Remove old files periodically
or merge small files would bring in some performance boost.
>>>>>>
>>>>>> On the Hive end, the memory consumed also depends on the queries
that are executed. Monitor the reducers of the Hadoop job, and my experiences are that reduce
part could be the bottleneck here.
>>>>>>
>>>>>> It's totally okay to host multiple Hive servers on one machine.
>>>>>>
>>>>>> 2011/12/12 王锋 <wfeng1982@163.com>
>>>>>>>
>>>>>>> is the files you said  the files from runned jobs  of our system?
and them  can't be so much large.
>>>>>>>
>>>>>>> why is the cause of namenode.  what are hiveserver doing   when
it use so large memory?
>>>>>>>
>>>>>>> how  do you use hive? our method using hiveserver is correct?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> 在 2011-12-12 14:27:09,"Aaron Sun" <aaron.sun82@gmail.com>
写道:
>>>>>>>
>>>>>>> Not sure if this is because of the number of files, since the
namenode would track each of the file and directory, and blocks.
>>>>>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>>>>>>>
>>>>>>> Please correct me if I am wrong, because this seems to be more
like a hdfs problem which is actually irrelevant to Hive.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Aaron
>>>>>>>
>>>>>>> 2011/12/11 王锋 <wfeng1982@163.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> I want to know why the hiveserver use so large memory,and
where the memory has been used ?
>>>>>>>>
>>>>>>>> 在 2011-12-12 10:02:44,"王锋" <wfeng1982@163.com>
写道:
>>>>>>>>
>>>>>>>>
>>>>>>>> The namenode summary:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> the mr summary
>>>>>>>>
>>>>>>>>
>>>>>>>> and hiveserver:
>>>>>>>>
>>>>>>>>
>>>>>>>> hiveserver jvm args:
>>>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m
-XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC -XX:ParallelGCThreads=20
-XX:+UseParall
>>>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps"
>>>>>>>>
>>>>>>>> now we  using 3 hiveservers in the same machine.
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2011-12-12 09:54:29,"Aaron Sun" <aaron.sun82@gmail.com>
写道:
>>>>>>>>
>>>>>>>> how's the data look like? and what's the size of the cluster?
>>>>>>>>
>>>>>>>> 2011/12/11 王锋 <wfeng1982@163.com>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>     I'm one of engieer of sina.com.  We have used hive
,hiveserver several months. We have our own tasks schedule system .The system can schedule
tasks running with hiveserver by jdbc.
>>>>>>>>>
>>>>>>>>>     But The hiveserver use mem very large, usally  large
than 10g.   we have 5min tasks which will be  running every 5 minutes.,and have hourly tasks
.total num of tasks  is 40. And we start 3 hiveserver in one linux server,and be cycle connected
.
>>>>>>>>>
>>>>>>>>>     so why Memory of  hiveserver  using so large and
how we do or some suggestion from you ?
>>>>>>>>>
>>>>>>>>> Thanks and Best Regards!
>>>>>>>>>
>>>>>>>>> Royce Wang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Alexander Lorenz
>>>>> http://mapredit.blogspot.com
>>>>>
>>>>> P Think of the environment: please don't print this email unless you
really need to.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Alexander Lorenz
>>>>http://mapredit.blogspot.com
>>>>
>>>>P Think of the environment: please don't print this email unless you
>>>>really need to.
>>>
>>>
>>>
>>
>>
>>
>> --
>> Alexander Lorenz
>> http://mapredit.blogspot.com
>>
>> P Think of the environment: please don't print this email unless you
>> really need to.
>
>
>
>-- 
>Alexander Lorenz
>http://mapredit.blogspot.com
>
>P Think of the environment: please don't print this email unless you
>really need to.



Mime
View raw message