hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastião Gonella <sebastiao.gone...@gmail.com>
Subject Re: Error - failed to load queue and user definition
Date Thu, 13 Apr 2017 13:39:01 GMT
Thanks Shubham,

I followed the steps and everything returned to normal.

I really thank everyone for the support and attention. That was an
apprenticeship.

[]'s,
[image: photo] *Sebastião M. P. Gonella*

Mobile: 61-984021512
Email: sebastiao.gonella@gmail.com
Skype: segonella
<#UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_>


2017-04-13 3:39 GMT-03:00 Yi Jin <yjin@pivotal.io>:

> Jira is created.
>
> https://issues.apache.org/jira/browse/HAWQ-1433
>
> Best,
> Yi
>
> On Thu, Apr 13, 2017 at 4:10 PM, Yi Jin <yjin@pivotal.io> wrote:
>
>> ​​
>> Thanks Shubham,
>>
>> This should be a bug, I will open a jira to have it fixed asap.
>>
>> Best,
>> Yi
>>
>> On Thu, Apr 13, 2017 at 2:20 PM, Shubham Sharma <
>> topologicalqubit@gmail.com> wrote:
>>
>>> Description of what might be happening here -
>>>
>>> If a user tries to update core_limit_cluster/memory_limit parameter in
>>>  resource queue with a wrong value(without a percent sign) , the alter
>>> statement is successful but on restart of hawq cluster it throws error and
>>> hawq becomes unusable. In case you want to revert back to older settings ,
>>> even the ALTER statement will fail on the cluster.
>>>
>>> On Wed, Apr 12, 2017 at 9:11 PM, Shubham Sharma <
>>> topologicalqubit@gmail.com> wrote:
>>>
>>>> Hello Sebastio, I think you have encountered the following issue -
>>>>
>>>> *1 - Problem -  alter resource queue pg_default with
>>>> (CORE_LIMIT_CLUSTER/MEMORY_LIMIT_CLUSTER=90);*
>>>>
>>>> gpadmin=# select * from pg_resqueue;
>>>>   rsqname   | parentoid | activestats | memorylimit | corelimit |
>>>> resovercommit | allocpolicy | vsegresourcequota | nvsegupperlimit |
>>>> nvseglowerlimit | nvseg
>>>> upperlimitperseg | nvseglowerlimitperseg | creationtime |
>>>>  updatetime           | status
>>>> ------------+-----------+-------------+-------------+-------
>>>> ----+---------------+-------------+-------------------+-----
>>>> ------------+-----------------+------
>>>> -----------------+-----------------------+--------------+---
>>>> ----------------------------+--------
>>>>  pg_root    |         0 |          -1 | 100%        | 100%      |
>>>>       2 | even        |                   |               0 |
>>>> 0 |
>>>>                0 |                     0 |              |
>>>>                 | branch
>>>>  pg_default |      9800 |          20 | 50%         | 50%       |
>>>>       2 | even        | mem:256mb         |               0 |
>>>> 0 |
>>>>                0 |                     0 |              | 2017-04-12
>>>> 22:45:55.056102+01 |
>>>> (2 rows)
>>>>
>>>> gpadmin=# alter resource queue pg_default with (CORE_LIMIT_CLUSTER=90);
>>>> ALTER QUEUE
>>>>
>>>> gpadmin=# select * from test;
>>>>  a
>>>> ---
>>>> (0 rows)
>>>> gpadmin=# \q
>>>>
>>>> 2 - restart hawq cluster
>>>>
>>>> 3 - ERROR
>>>>
>>>> [gpadmin@hdp3 ~]$ psql
>>>> psql (8.2.15)
>>>> Type "help" for help.
>>>> gpadmin=# select * from test;
>>>> WARNING:  FD 31 having errors raised. errno 104
>>>> ERROR:  failed to register in resource manager, failed to receive
>>>> content (pquery.c:787)
>>>>
>>>> 3 - alter resource queue pg_default with (
>>>> *CORE_LIMIT_CLUSTER/MEMORY_LIMIT_CLUSTER*=50%); --Let's switch back
>>>> ! Not allowed !
>>>> alter resource queue pg_default with (CORE_LIMIT_CLUSTER=50%);
>>>> WARNING:  FD 33 having errors raised. errno 104
>>>> ERROR:  failed to register in resource manager, failed to receive
>>>> content (resqueuecommand.c:364)
>>>>
>>>> 4 -  How to fix - Please be extra careful while using this.
>>>> gpadmin=# begin;
>>>> BEGIN
>>>> gpadmin=# set allow_system_table_mods='dml';
>>>> SET
>>>> gpadmin=# select * from pg_resqueue where corelimit=90;
>>>>   rsqname   | parentoid | activestats | memorylimit | corelimit |
>>>> resovercommit | allocpolicy | vsegresourcequota | nvsegupperlimit |
>>>> nvseglowerlimit | nvseg
>>>> upperlimitperseg | nvseglowerlimitperseg | creationtime |
>>>>  updatetime           | status
>>>> ------------+-----------+-------------+-------------+-------
>>>> ----+---------------+-------------+-------------------+-----
>>>> ------------+-----------------+------
>>>> -----------------+-----------------------+--------------+---
>>>> ----------------------------+--------
>>>>  pg_default |      9800 |          20 | 50%         | 90        |
>>>>       2 | even        | mem:256mb         |               0 |
>>>> 0 |
>>>>                0 |                     0 |              | 2017-04-12
>>>> 22:59:30.092823+01 |
>>>> (1 row)
>>>> gpadmin=# update pg_resqueue set corelimit='50%' where corelimit=90;
>>>> UPDATE 1
>>>> gpadmin=# commit;
>>>> COMMIT
>>>>
>>>> *5 - System should be back to normal*
>>>>
>>>> gpadmin=# select * from test;
>>>>  a
>>>> ---
>>>> (0 rows)
>>>>
>>>>
>>>> Regards,
>>>> Shubh
>>>>
>>>> On Wed, Apr 12, 2017 at 9:07 PM, Yi Jin <yjin@pivotal.io> wrote:
>>>>
>>>>> Hi *Sebastião*
>>>>>
>>>>> I am curious what's the result of this statement.
>>>>>
>>>>> alter resource queue pg_default with (MEMORY_LIMIT_CLUSTER=90);
>>>>>
>>>>> I guess this should report error, as currently, MEMORY_LIMIT_CLUSTER
>>>>> cannot be different with CORE_LIMIT_CLUSTER.
>>>>>
>>>>> By checking your log, I think there maybe something wrong in
>>>>> pg_resqueue. Can you set hawq-site.xml with following setting, restart
hawq
>>>>> and mail me complete log for investigation?
>>>>>
>>>>> <property>
>>>>>     <name>log_min_messages</name>
>>>>>     <value>DEBUG3</value>
>>>>> </property>
>>>>>
>>>>>
>>>>> Best,
>>>>> Yi
>>>>>
>>>>> On Thu, Apr 13, 2017 at 1:39 PM, Xiang Sheng <xsheng@pivotal.io>
>>>>> wrote:
>>>>>
>>>>>> Using "select * from gp_segment_configuration ; " to check why your
3
>>>>>> segments down.
>>>>>>
>>>>>> And your sql to alter resource queue is not correct.
>>>>>> The log said "memorylimit and corelimit must use the same formats
to
>>>>>> express resource limit".
>>>>>> Please refer to docs ALTER-RESOURCE-QUEUE
>>>>>> <http://hawq.incubator.apache.org/docs/userguide/2.1.0.0-incubating/reference/sql/ALTER-RESOURCE-QUEUE.html>
>>>>>> . And  Checking Existing Resource Queues
>>>>>> <http://hawq.incubator.apache.org/docs/userguide/2.1.0.0-incubating/resourcemgmt/ResourceQueues.html#topic_lqy_gls_zt>
>>>>>>
>>>>>> On Thu, Apr 13, 2017 at 11:14 AM, Sebastião Gonella <
>>>>>> sebastiao.gonella@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Thanks in advance for the support that you will be providing.
We
>>>>>>> have Hawq version 2.0 in which we use to manage a few billion
of records.
>>>>>>> In order to improve the queries performance we did tune some
parameters in
>>>>>>> Hawq such as queue: pg_default, now long Hawq is not working
as expected
>>>>>>> and the data in the segments are no longer available.
>>>>>>>
>>>>>>> Probably I should need to create a new user and another queue,
but
>>>>>>> before have to fix this and I have no idea, how to do it. Please,
I need
>>>>>>> help.
>>>>>>>
>>>>>>> The modifications made in the queue were:
>>>>>>>
>>>>>>> stn_bi=# alter resource queue pg_default with
>>>>>>> (vseg_resource_quota='mem:8gb');
>>>>>>> stn_bi=# alter resource queue pg_default with
>>>>>>> (MEMORY_LIMIT_CLUSTER=90);
>>>>>>>
>>>>>>> Now when I start Hawq's cluster, both master and the segments
start
>>>>>>> successfully, but executing the hawq state command, the following
returns:
>>>>>>>
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--HAWQ
>>>>>>> instance status summary
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
>>>>>>> :gpadmin-[INFO]:--------------------------------------------
>>>>>>> ----------
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Master instance =
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> No Standby master defined
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Total segment instance count from config file = 3
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
>>>>>>> :gpadmin-[INFO]:--------------------------------------------
>>>>>>> ----------
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Segment Status
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
>>>>>>> :gpadmin-[INFO]:--------------------------------------------
>>>>>>> ----------
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Total segments count from catalog = 0
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Total segment valid (at master) = 0
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Total segment failures (at master) = 3
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Total number of postmaster.pid files missing = 0
>>>>>>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>>>>>>> Total number of postmaster.pid files found = 3
>>>>>>>
>>>>>>> In the pg_log files belonging to the master I have the following
>>>>>>> output errors:
>>>>>>>
>>>>>>> 2017-04-12 17:41:51.096671 BRT,,,p17360,th-1671837248,,,,
>>>>>>> 0,con6702,,seg-10000,,,,,"LOG","00000","Clean up handler in message
>>>>>>> server is called.",,,,,,,0,,"rmcomm_MessageServer.c",105,
>>>>>>> 2017-04-12 17:41:51.099905 BRT,,,p8534,th-1671837248,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","resourcemanager process (PID
17360)
>>>>>>> exited with exit code 1",,,,,,,0,,"postmaster.c",4726,
>>>>>>> 2017-04-12 17:41:51.105210 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager starts
>>>>>>> accepting resource request. Listening normal socket port 5437.
Total
>>>>>>> listened 1 FDs.",,,,,,,0,,"resourcemanager.c",2495,
>>>>>>> 2017-04-12 17:41:51.105454 BRT,,,p8534,th-1671837248,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","Wait for HAWQ RM
>>>>>>> -1",,,,,,,0,,"resourcemanager.c",421,
>>>>>>> 2017-04-12 17:41:51.105583 BRT,,,p8534,th-1671837248,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","HAWQ :: Received signal
>>>>>>> notification that HAWQ RM works now.",,,,,,,0,,"resourcemanage
>>>>>>> r.c",429,
>>>>>>> 2017-04-12 17:41:51.175205 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","Cleanup segment
>>>>>>> configuration catalog table successfully!",,,,,,,0,,"resou
>>>>>>> rcepool.c",460,
>>>>>>> 2017-04-12 17:41:51.183070 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","Cleanup segment
>>>>>>> configuration history catalog table successfully, keep period:
recent 365
>>>>>>> days.",,,,,,,0,,"resourcepool.c",530,
>>>>>>> 2017-04-12 17:41:51.189397 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","Add a new row into segment
>>>>>>> configuration catalog table,registration order:0, role:m, status:u,
>>>>>>> port:5432, hostname:big08-hadoop-master.stn.intra, address:
>>>>>>> big08-hadoop-master.stn.intra, description:",,,,,,,0,,"resour
>>>>>>> cepool.c",879,
>>>>>>> 2017-04-12 17:41:51.195123 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager
>>>>>>> successfully loaded role specifications.",,,,,,,0,,"res
>>>>>>> ourcemanager.c",1275,
>>>>>>> 2017-04-12 17:41:51.200825 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manger
>>>>>>> successfully loaded resource queue specifications",,,,,,,0,,"reso
>>>>>>> urcemanager.c",1585,
>>>>>>> 2017-04-12 17:41:51.200858 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"WARNING","01000","memorylimit and
>>>>>>> corelimit must use the same formats to express resource
>>>>>>> limit",,,,,,,0,,"resqueuemanager.c",708,
>>>>>>> 2017-04-12 17:41:51.200874 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"WARNING","01000","Resource manager
cannot
>>>>>>> create resource queue with its attributes because memorylimit
and corelimit
>>>>>>> must use the same formats to express resource limit",,,,,,,0,,"resourcemanag
>>>>>>> er.c",1828,
>>>>>>> 2017-04-12 17:41:51.200890 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager created
>>>>>>> resource queue instance :
>>>>>>> RESQUEUE:ID=9800,Name=pg_root,PARENT=0,LIMIT(MEM=100.000000%,CORE=100.000000%),RATIO=0
>>>>>>> MBPCORE,INUSE(0 MB, 0.000000 CORE),CONN=0,INQUEUE=0.",,,,,,
>>>>>>> ,0,,"resourcemanager.c",1974,
>>>>>>> 2017-04-12 17:41:51.200910 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"WARNING","01000","resource queue cannot
>>>>>>> parse role attribute, cannot find target resource queue
>>>>>>> '6055'",,,,,,,0,,"resqueuemanager.c",2668,
>>>>>>> 2017-04-12 17:41:51.200924 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"WARNING","01000","cannot create user
with
>>>>>>> its attributes because cannot find target resource queue
>>>>>>> '6055'",,,,,,,0,,"resourcemanager.c",1995,
>>>>>>> 2017-04-12 17:41:51.200938 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"LOG","00000","failed to load queue
and
>>>>>>> user definition.",,,,,,,0,,"resourcemanager.c",1130,
>>>>>>> 2017-04-12 17:41:51.215416 BRT,,,p17367,th-1671837248,,,,
>>>>>>> 0,con6708,,seg-10000,,,,,"FATAL","XX000","failed to load queue
and
>>>>>>> user definition. (resourcemanager.c:496)",,,,,,
>>>>>>> ,0,,"resourcemanager.c",496,"Stack trace:
>>>>>>> 1 0x8b7038 postgres errstart + 0x288
>>>>>>> 2 0x8b8dbb postgres elog_finish + 0xab
>>>>>>> 3 0x951799 postgres ResManagerMainServer2ndPhase + 0x1d9
>>>>>>> 4 0x951e04 postgres ResManagerMain + 0x534
>>>>>>> 5 0x952151 postgres ResManagerProcessStartup + 0x171
>>>>>>> 6 0x78fecb postgres <symbol not found> + 0x78fecb
>>>>>>> 7 0x792939 postgres PostmasterMain + 0x759
>>>>>>> 8 0x4a15af postgres main + 0x50f
>>>>>>> 9 0x7f3b99553b15 libc.so.6 __libc_start_main + 0xf5
>>>>>>> 10 0x4a162d postgres <symbol not found> + 0x4a162d
>>>>>>>
>>>>>>> trying to access the data from the database I get the following
>>>>>>> error message:
>>>>>>>
>>>>>>> stn_bi=# select * from wd_documento_emissao limit 1;
>>>>>>> WARNING: FD 28 having errors raised. errno 104
>>>>>>> ERROR: failed to register in resource manager, failed to receive
>>>>>>> content (pquery.c:787)
>>>>>>>
>>>>>>> I'm not able to get the previously parameter values And the
>>>>>>> following error message returns :
>>>>>>>
>>>>>>> postgres=# alter resource queue pg_default with
>>>>>>> (CORE_LIMIT_CLUSTER=50);
>>>>>>> WARNING: FD 45 having errors raised. errno 104
>>>>>>> ERROR: failed to register in resource manager, failed to receive
>>>>>>> content (resqueuecommand.c:364)
>>>>>>>
>>>>>>> Since I was using Hawq as resource manager (hawq_global_rm_type
=
>>>>>>> none), I did modify it for Yarn in order to to solve the probable
issue,
>>>>>>> but even this without success.
>>>>>>>
>>>>>>> Below the output for the pg_log regarding to the master when
managed
>>>>>>> Yarn:
>>>>>>>
>>>>>>>
>>>>>>> 2017-04-12 21:29:06.915122 BRT,,,p573383,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> accepted YARN connection arguments : YARN Server big03-hadoop-master:8050
>>>>>>> Scheduler server big03-hadoop-master:8030 Queue default Application
name
>>>>>>> hawq, by user:postgres",,,,,,,0,,"resou
>>>>>>> rcebroker_LIBYARN_proc.c",506,
>>>>>>> 2017-04-12 21:29:06.915192 BRT,,,p495836,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","Wait for HAWQ RM
>>>>>>> -1",,,,,,,0,,"resourcemanager.c",421,
>>>>>>> 2017-04-12 21:29:06.915213 BRT,,,p495836,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","HAWQ :: Received signal
>>>>>>> notification that HAWQ RM works now.",,,,,,,0,,"resourcemanage
>>>>>>> r.c",429,
>>>>>>> 2017-04-12 21:29:06.915735 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.915669, p572922, th139994550884096, INFO
>>>>>>> LibYarnClient::heartbeatFunc, goes into exit phase.",,,,,,,,"SysLoggerMain"
>>>>>>> ,"syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.916126 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.915883, p572922, th139995204463040, INFO
>>>>>>> LibYarnClient::finishJob, join heart-beat thread
>>>>>>> successfully.",,,,,,,,"SysLoggerMain","syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.916395 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.916142, p573383, th139995204463040, INFO
>>>>>>> ApplicationClient session auth method : simple",,,,,,,,"SysLoggerMain"
>>>>>>> ,"syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.916561 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.916489, p572922, th139995204463040, INFO
>>>>>>> LibYarnClient::finishJob, finish AM for jobId:job_1491961152436_24865,
>>>>>>> finalStatus:1",,,,,,,,"SysLoggerMain","syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.916808 BRT,,,p572922,th745449920,,,,0
>>>>>>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> finished job in YARN.",,,,,,,0,,"resourcebroke
>>>>>>> r_LIBYARN_proc.c",1872,
>>>>>>> 2017-04-12 21:29:06.916834 BRT,,,p572922,th745449920,,,,0
>>>>>>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> get result of finish yarn application through libYARN
>>>>>>> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",321,
>>>>>>> 2017-04-12 21:29:06.916855 BRT,,,p572922,th745449920,,,,0
>>>>>>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> get result of disconnecting YARN through libYARN
>>>>>>> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",331,
>>>>>>> 2017-04-12 21:29:06.916868 BRT,,,p572922,th745449920,,,,0
>>>>>>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> goes into exit phase.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",163,
>>>>>>> 2017-04-12 21:29:06.923056 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.922993, p573383, th139995204463040, INFO
>>>>>>> ApplicationClient Resource Manager HA is disable.",,,,,,,,"SysLoggerMai
>>>>>>> n","syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.957206 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.957002, p572539, th139994550884096, INFO
>>>>>>> LibYarnClient::heartbeatFunc, goes into exit phase.",,,,,,,,"SysLoggerMain"
>>>>>>> ,"syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.957419 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.957284, p572539, th139995204463040, INFO
>>>>>>> LibYarnClient::finishJob, join heart-beat thread
>>>>>>> successfully.",,,,,,,,"SysLoggerMain","syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.958595 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:06.958384, p572539, th139995204463040, INFO
>>>>>>> LibYarnClient::finishJob, finish AM for jobId:job_1491961152436_24827,
>>>>>>> finalStatus:1",,,,,,,,"SysLoggerMain","syslogger.c",518,
>>>>>>> 2017-04-12 21:29:06.958768 BRT,,,p572539,th745449920,,,,0
>>>>>>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> finished job in YARN.",,,,,,,0,,"resourcebroke
>>>>>>> r_LIBYARN_proc.c",1872,
>>>>>>> 2017-04-12 21:29:06.958809 BRT,,,p572539,th745449920,,,,0
>>>>>>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> get result of finish yarn application through libYARN
>>>>>>> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",321,
>>>>>>> 2017-04-12 21:29:06.958892 BRT,,,p572539,th745449920,,,,0
>>>>>>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> get result of disconnecting YARN through libYARN
>>>>>>> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",331,
>>>>>>> 2017-04-12 21:29:06.958930 BRT,,,p572539,th745449920,,,,0
>>>>>>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>>>>>>> goes into exit phase.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",163,
>>>>>>> 2017-04-12 21:29:06.959059 BRT,,,p572539,th745449920,,,,0
>>>>>>> ,con43939,,seg-10000,,,,,"LOG","00000","failed to find proc
>>>>>>> 0x7f531b01b600 in ProcArray",,,,,,,0,,"procarray.c",184,
>>>>>>> 2017-04-12 21:29:06.999484 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","Cleanup segment
>>>>>>> configuration catalog table successfully!",,,,,,,0,,"resou
>>>>>>> rcepool.c",460,
>>>>>>> 2017-04-12 21:29:07.008804 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","Cleanup segment
>>>>>>> configuration history catalog table successfully, keep period:
recent 365
>>>>>>> days.",,,,,,,0,,"resourcepool.c",530,
>>>>>>> 2017-04-12 21:29:07.015893 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","Add a new row into segment
>>>>>>> configuration catalog table,registration order:0, role:m, status:u,
>>>>>>> port:5432, hostname:big08-hadoop-master.stn.intra, address:
>>>>>>> big08-hadoop-master.stn.intra, description:",,,,,,,0,,"resour
>>>>>>> cepool.c",879,
>>>>>>> 2017-04-12 21:29:07.022082 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manager
>>>>>>> successfully loaded role specifications.",,,,,,,0,,"res
>>>>>>> ourcemanager.c",1275,
>>>>>>> 2017-04-12 21:29:07.027941 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manger
>>>>>>> successfully loaded resource queue specifications",,,,,,,0,,"reso
>>>>>>> urcemanager.c",1585,
>>>>>>> 2017-04-12 21:29:07.028062 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"WARNING","01000","memorylimit and
>>>>>>> corelimit must use the same formats to express resource
>>>>>>> limit",,,,,,,0,,"resqueuemanager.c",708,
>>>>>>> 2017-04-12 21:29:07.028082 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"WARNING","01000","Resource manager
cannot
>>>>>>> create resource queue with its attributes because memorylimit
and corelimit
>>>>>>> must use the same formats to express resource limit",,,,,,,0,,"resourcemanag
>>>>>>> er.c",1828,
>>>>>>> 2017-04-12 21:29:07.028147 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manager created
>>>>>>> resource queue instance :
>>>>>>> RESQUEUE:ID=9800,Name=pg_root,PARENT=0,LIMIT(MEM=100.000000%,CORE=100.000000%),RATIO=0
>>>>>>> MBPCORE,INUSE(0 MB, 0.000000 CORE),CONN=0,INQUEUE=0.",,,,,,
>>>>>>> ,0,,"resourcemanager.c",1974,
>>>>>>> 2017-04-12 21:29:07.028171 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"WARNING","01000","resource queue cannot
>>>>>>> parse role attribute, cannot find target resource queue
>>>>>>> '6055'",,,,,,,0,,"resqueuemanager.c",2668,
>>>>>>> 2017-04-12 21:29:07.028185 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"WARNING","01000","cannot create user
with
>>>>>>> its attributes because cannot find target resource queue
>>>>>>> '6055'",,,,,,,0,,"resourcemanager.c",1995,
>>>>>>> 2017-04-12 21:29:07.028199 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"LOG","00000","failed to load queue
and
>>>>>>> user definition.",,,,,,,0,,"resourcemanager.c",1130,
>>>>>>> 2017-04-12 21:29:07.042672 BRT,,,p495838,th745449920,,,,0
>>>>>>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>>>>>>> 2017-04-12 21:29:07.042514, p573313, th139995204463040, INFO
>>>>>>> ApplicationClient RM Scheduler HA is disable.",,,,,,,,"SysLoggerMai
>>>>>>> n","syslogger.c",518,
>>>>>>> 2017-04-12 21:29:07.043183 BRT,,,p573381,th745449920,,,,0
>>>>>>> ,con44443,,seg-10000,,,,,"FATAL","XX000","failed to load queue
and
>>>>>>> user definition. (resourcemanager.c:496)",,,,,,
>>>>>>> ,0,,"resourcemanager.c",496,"Stack trace:
>>>>>>> 1 0x8b7038 postgres errstart + 0x288
>>>>>>> 2 0x8b8dbb postgres elog_finish + 0xab
>>>>>>> 3 0x951799 postgres ResManagerMainServer2ndPhase + 0x1d9
>>>>>>> 4 0x951e04 postgres ResManagerMain + 0x534
>>>>>>> 5 0x952151 postgres ResManagerProcessStartup + 0x171
>>>>>>> 6 0x78fecb postgres <symbol not found> + 0x78fecb
>>>>>>> 7 0x792939 postgres PostmasterMain + 0x759
>>>>>>> 8 0x4a15af postgres main + 0x50f
>>>>>>> 9 0x7f53296a1b15 libc.so.6 __libc_start_main + 0xf5
>>>>>>> 10 0x4a162d postgres <symbol not found> + 0x4a162d
>>>>>>>
>>>>>>> Some properties of hawq-site.xml that I use.
>>>>>>>
>>>>>>> hawq_rm_memory_limit_perseg=8GB
>>>>>>> hawq_rm_nvcore_limit_perseg=4
>>>>>>> default_hash_table_bucket_number=6
>>>>>>>
>>>>>>> And lastly, the gp_segment_configuration table does not contain
the
>>>>>>> segments, only the master node.
>>>>>>>
>>>>>>> Any thoughts are really appreciated.
>>>>>>>
>>>>>>>
>>>>>>> []'s,
>>>>>>> [image: photo] *Sebastião M. P. Gonella*
>>>>>>>
>>>>>>> Mobile: 61-984021512
>>>>>>> Email: sebastiao.gonella@gmail.com
>>>>>>> Skype: segonella
>>>>>>> <#m_-6253292342430146567_m_4430742686445780573_m_6160270372338289364_m_-9218482812502069821_m_-7245955265557091197_m_7306541628110140158_m_4997655993411867239_m_2751700955444586072_m_-8418648329945005482_UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Xiang Sheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message