hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Jin <y...@pivotal.io>
Subject Re: Error - failed to load queue and user definition
Date Thu, 13 Apr 2017 04:07:12 GMT
Hi *Sebastião*

I am curious what's the result of this statement.

alter resource queue pg_default with (MEMORY_LIMIT_CLUSTER=90);

I guess this should report error, as currently, MEMORY_LIMIT_CLUSTER cannot
be different with CORE_LIMIT_CLUSTER.

By checking your log, I think there maybe something wrong in pg_resqueue.
Can you set hawq-site.xml with following setting, restart hawq and mail me
complete log for investigation?

<property>
    <name>log_min_messages</name>
    <value>DEBUG3</value>
</property>


Best,
Yi

On Thu, Apr 13, 2017 at 1:39 PM, Xiang Sheng <xsheng@pivotal.io> wrote:

> Using "select * from gp_segment_configuration ; " to check why your 3
> segments down.
>
> And your sql to alter resource queue is not correct.
> The log said "memorylimit and corelimit must use the same formats to
> express resource limit".
> Please refer to docs ALTER-RESOURCE-QUEUE
> <http://hawq.incubator.apache.org/docs/userguide/2.1.0.0-incubating/reference/sql/ALTER-RESOURCE-QUEUE.html>
> . And  Checking Existing Resource Queues
> <http://hawq.incubator.apache.org/docs/userguide/2.1.0.0-incubating/resourcemgmt/ResourceQueues.html#topic_lqy_gls_zt>
>
> On Thu, Apr 13, 2017 at 11:14 AM, Sebastião Gonella <
> sebastiao.gonella@gmail.com> wrote:
>
>> Hi all,
>>
>> Thanks in advance for the support that you will be providing. We have
>> Hawq version 2.0 in which we use to manage a few billion of records. In
>> order to improve the queries performance we did tune some parameters in
>> Hawq such as queue: pg_default, now long Hawq is not working as expected
>> and the data in the segments are no longer available.
>>
>> Probably I should need to create a new user and another queue, but before
>> have to fix this and I have no idea, how to do it. Please, I need help.
>>
>> The modifications made in the queue were:
>>
>> stn_bi=# alter resource queue pg_default with
>> (vseg_resource_quota='mem:8gb');
>> stn_bi=# alter resource queue pg_default with (MEMORY_LIMIT_CLUSTER=90);
>>
>> Now when I start Hawq's cluster, both master and the segments start
>> successfully, but executing the hawq state command, the following returns:
>>
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--HAWQ
>> instance status summary
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
>> :gpadmin-[INFO]:------------------------------------------------------
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Master instance =
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> No Standby master defined
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Total segment instance count from config file = 3
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
>> :gpadmin-[INFO]:------------------------------------------------------
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Segment Status
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
>> :gpadmin-[INFO]:------------------------------------------------------
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Total segments count from catalog = 0
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Total segment valid (at master) = 0
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Total segment failures (at master) = 3
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Total number of postmaster.pid files missing = 0
>> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
>> Total number of postmaster.pid files found = 3
>>
>> In the pg_log files belonging to the master I have the following output
>> errors:
>>
>> 2017-04-12 17:41:51.096671 BRT,,,p17360,th-1671837248,,,,
>> 0,con6702,,seg-10000,,,,,"LOG","00000","Clean up handler in message
>> server is called.",,,,,,,0,,"rmcomm_MessageServer.c",105,
>> 2017-04-12 17:41:51.099905 BRT,,,p8534,th-1671837248,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","resourcemanager process (PID 17360)
>> exited with exit code 1",,,,,,,0,,"postmaster.c",4726,
>> 2017-04-12 17:41:51.105210 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager starts
>> accepting resource request. Listening normal socket port 5437. Total
>> listened 1 FDs.",,,,,,,0,,"resourcemanager.c",2495,
>> 2017-04-12 17:41:51.105454 BRT,,,p8534,th-1671837248,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","Wait for HAWQ RM
>> -1",,,,,,,0,,"resourcemanager.c",421,
>> 2017-04-12 17:41:51.105583 BRT,,,p8534,th-1671837248,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","HAWQ :: Received signal notification
>> that HAWQ RM works now.",,,,,,,0,,"resourcemanager.c",429,
>> 2017-04-12 17:41:51.175205 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
>> catalog table successfully!",,,,,,,0,,"resourcepool.c",460,
>> 2017-04-12 17:41:51.183070 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
>> history catalog table successfully, keep period: recent 365
>> days.",,,,,,,0,,"resourcepool.c",530,
>> 2017-04-12 17:41:51.189397 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","Add a new row into segment
>> configuration catalog table,registration order:0, role:m, status:u,
>> port:5432, hostname:big08-hadoop-master.stn.intra, address:
>> big08-hadoop-master.stn.intra, description:",,,,,,,0,,"resour
>> cepool.c",879,
>> 2017-04-12 17:41:51.195123 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager successfully
>> loaded role specifications.",,,,,,,0,,"resourcemanager.c",1275,
>> 2017-04-12 17:41:51.200825 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manger successfully
>> loaded resource queue specifications",,,,,,,0,,"resourcemanager.c",1585,
>> 2017-04-12 17:41:51.200858 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"WARNING","01000","memorylimit and corelimit
>> must use the same formats to express resource limit",,,,,,,0,,"resqueuemanag
>> er.c",708,
>> 2017-04-12 17:41:51.200874 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"WARNING","01000","Resource manager cannot
>> create resource queue with its attributes because memorylimit and corelimit
>> must use the same formats to express resource limit",,,,,,,0,,"resourcemanag
>> er.c",1828,
>> 2017-04-12 17:41:51.200890 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager created
>> resource queue instance :
>> RESQUEUE:ID=9800,Name=pg_root,PARENT=0,LIMIT(MEM=100.000000%,CORE=100.000000%),RATIO=0
>> MBPCORE,INUSE(0 MB, 0.000000 CORE),CONN=0,INQUEUE=0.",,,,,,
>> ,0,,"resourcemanager.c",1974,
>> 2017-04-12 17:41:51.200910 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"WARNING","01000","resource queue cannot parse
>> role attribute, cannot find target resource queue
>> '6055'",,,,,,,0,,"resqueuemanager.c",2668,
>> 2017-04-12 17:41:51.200924 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"WARNING","01000","cannot create user with its
>> attributes because cannot find target resource queue
>> '6055'",,,,,,,0,,"resourcemanager.c",1995,
>> 2017-04-12 17:41:51.200938 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"LOG","00000","failed to load queue and user
>> definition.",,,,,,,0,,"resourcemanager.c",1130,
>> 2017-04-12 17:41:51.215416 BRT,,,p17367,th-1671837248,,,,
>> 0,con6708,,seg-10000,,,,,"FATAL","XX000","failed to load queue and user
>> definition. (resourcemanager.c:496)",,,,,,,0,,"resourcemanager.c",496,"Stack
>> trace:
>> 1 0x8b7038 postgres errstart + 0x288
>> 2 0x8b8dbb postgres elog_finish + 0xab
>> 3 0x951799 postgres ResManagerMainServer2ndPhase + 0x1d9
>> 4 0x951e04 postgres ResManagerMain + 0x534
>> 5 0x952151 postgres ResManagerProcessStartup + 0x171
>> 6 0x78fecb postgres <symbol not found> + 0x78fecb
>> 7 0x792939 postgres PostmasterMain + 0x759
>> 8 0x4a15af postgres main + 0x50f
>> 9 0x7f3b99553b15 libc.so.6 __libc_start_main + 0xf5
>> 10 0x4a162d postgres <symbol not found> + 0x4a162d
>>
>> trying to access the data from the database I get the following error
>> message:
>>
>> stn_bi=# select * from wd_documento_emissao limit 1;
>> WARNING: FD 28 having errors raised. errno 104
>> ERROR: failed to register in resource manager, failed to receive content
>> (pquery.c:787)
>>
>> I'm not able to get the previously parameter values And the following
>> error message returns :
>>
>> postgres=# alter resource queue pg_default with (CORE_LIMIT_CLUSTER=50);
>> WARNING: FD 45 having errors raised. errno 104
>> ERROR: failed to register in resource manager, failed to receive content
>> (resqueuecommand.c:364)
>>
>> Since I was using Hawq as resource manager (hawq_global_rm_type = none),
>> I did modify it for Yarn in order to to solve the probable issue, but even
>> this without success.
>>
>> Below the output for the pg_log regarding to the master when managed Yarn:
>>
>>
>> 2017-04-12 21:29:06.915122 BRT,,,p573383,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>> accepted YARN connection arguments : YARN Server big03-hadoop-master:8050
>> Scheduler server big03-hadoop-master:8030 Queue default Application name
>> hawq, by user:postgres",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",506,
>> 2017-04-12 21:29:06.915192 BRT,,,p495836,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","Wait for HAWQ RM
>> -1",,,,,,,0,,"resourcemanager.c",421,
>> 2017-04-12 21:29:06.915213 BRT,,,p495836,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","HAWQ :: Received signal notification
>> that HAWQ RM works now.",,,,,,,0,,"resourcemanager.c",429,
>> 2017-04-12 21:29:06.915735 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.915669, p572922, th139994550884096, INFO
>> LibYarnClient::heartbeatFunc, goes into exit phase.",,,,,,,,"SysLoggerMain"
>> ,"syslogger.c",518,
>> 2017-04-12 21:29:06.916126 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.915883, p572922, th139995204463040, INFO
>> LibYarnClient::finishJob, join heart-beat thread
>> successfully.",,,,,,,,"SysLoggerMain","syslogger.c",518,
>> 2017-04-12 21:29:06.916395 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.916142, p573383, th139995204463040, INFO
>> ApplicationClient session auth method : simple",,,,,,,,"SysLoggerMain"
>> ,"syslogger.c",518,
>> 2017-04-12 21:29:06.916561 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.916489, p572922, th139995204463040, INFO
>> LibYarnClient::finishJob, finish AM for jobId:job_1491961152436_24865,
>> finalStatus:1",,,,,,,,"SysLoggerMain","syslogger.c",518,
>> 2017-04-12 21:29:06.916808 BRT,,,p572922,th745449920,,,,0
>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>> finished job in YARN.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1872,
>> 2017-04-12 21:29:06.916834 BRT,,,p572922,th745449920,,,,0
>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
>> result of finish yarn application through libYARN
>> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",321,
>> 2017-04-12 21:29:06.916855 BRT,,,p572922,th745449920,,,,0
>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
>> result of disconnecting YARN through libYARN 0",,,,,,,0,,"resourcebroker_LI
>> BYARN_proc.c",331,
>> 2017-04-12 21:29:06.916868 BRT,,,p572922,th745449920,,,,0
>> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker goes
>> into exit phase.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",163,
>> 2017-04-12 21:29:06.923056 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.922993, p573383, th139995204463040, INFO
>> ApplicationClient Resource Manager HA is disable.",,,,,,,,"SysLoggerMai
>> n","syslogger.c",518,
>> 2017-04-12 21:29:06.957206 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.957002, p572539, th139994550884096, INFO
>> LibYarnClient::heartbeatFunc, goes into exit phase.",,,,,,,,"SysLoggerMain"
>> ,"syslogger.c",518,
>> 2017-04-12 21:29:06.957419 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.957284, p572539, th139995204463040, INFO
>> LibYarnClient::finishJob, join heart-beat thread
>> successfully.",,,,,,,,"SysLoggerMain","syslogger.c",518,
>> 2017-04-12 21:29:06.958595 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:06.958384, p572539, th139995204463040, INFO
>> LibYarnClient::finishJob, finish AM for jobId:job_1491961152436_24827,
>> finalStatus:1",,,,,,,,"SysLoggerMain","syslogger.c",518,
>> 2017-04-12 21:29:06.958768 BRT,,,p572539,th745449920,,,,0
>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
>> finished job in YARN.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1872,
>> 2017-04-12 21:29:06.958809 BRT,,,p572539,th745449920,,,,0
>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
>> result of finish yarn application through libYARN
>> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",321,
>> 2017-04-12 21:29:06.958892 BRT,,,p572539,th745449920,,,,0
>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
>> result of disconnecting YARN through libYARN 0",,,,,,,0,,"resourcebroker_LI
>> BYARN_proc.c",331,
>> 2017-04-12 21:29:06.958930 BRT,,,p572539,th745449920,,,,0
>> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker goes
>> into exit phase.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",163,
>> 2017-04-12 21:29:06.959059 BRT,,,p572539,th745449920,,,,0
>> ,con43939,,seg-10000,,,,,"LOG","00000","failed to find proc
>> 0x7f531b01b600 in ProcArray",,,,,,,0,,"procarray.c",184,
>> 2017-04-12 21:29:06.999484 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
>> catalog table successfully!",,,,,,,0,,"resourcepool.c",460,
>> 2017-04-12 21:29:07.008804 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
>> history catalog table successfully, keep period: recent 365
>> days.",,,,,,,0,,"resourcepool.c",530,
>> 2017-04-12 21:29:07.015893 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","Add a new row into segment
>> configuration catalog table,registration order:0, role:m, status:u,
>> port:5432, hostname:big08-hadoop-master.stn.intra, address:
>> big08-hadoop-master.stn.intra, description:",,,,,,,0,,"resour
>> cepool.c",879,
>> 2017-04-12 21:29:07.022082 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manager successfully
>> loaded role specifications.",,,,,,,0,,"resourcemanager.c",1275,
>> 2017-04-12 21:29:07.027941 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manger successfully
>> loaded resource queue specifications",,,,,,,0,,"resourcemanager.c",1585,
>> 2017-04-12 21:29:07.028062 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"WARNING","01000","memorylimit and corelimit
>> must use the same formats to express resource limit",,,,,,,0,,"resqueuemanag
>> er.c",708,
>> 2017-04-12 21:29:07.028082 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"WARNING","01000","Resource manager cannot
>> create resource queue with its attributes because memorylimit and corelimit
>> must use the same formats to express resource limit",,,,,,,0,,"resourcemanag
>> er.c",1828,
>> 2017-04-12 21:29:07.028147 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manager created
>> resource queue instance :
>> RESQUEUE:ID=9800,Name=pg_root,PARENT=0,LIMIT(MEM=100.000000%,CORE=100.000000%),RATIO=0
>> MBPCORE,INUSE(0 MB, 0.000000 CORE),CONN=0,INQUEUE=0.",,,,,,
>> ,0,,"resourcemanager.c",1974,
>> 2017-04-12 21:29:07.028171 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"WARNING","01000","resource queue cannot parse
>> role attribute, cannot find target resource queue
>> '6055'",,,,,,,0,,"resqueuemanager.c",2668,
>> 2017-04-12 21:29:07.028185 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"WARNING","01000","cannot create user with its
>> attributes because cannot find target resource queue
>> '6055'",,,,,,,0,,"resourcemanager.c",1995,
>> 2017-04-12 21:29:07.028199 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"LOG","00000","failed to load queue and user
>> definition.",,,,,,,0,,"resourcemanager.c",1130,
>> 2017-04-12 21:29:07.042672 BRT,,,p495838,th745449920,,,,0
>> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
>> 2017-04-12 21:29:07.042514, p573313, th139995204463040, INFO
>> ApplicationClient RM Scheduler HA is disable.",,,,,,,,"SysLoggerMai
>> n","syslogger.c",518,
>> 2017-04-12 21:29:07.043183 BRT,,,p573381,th745449920,,,,0
>> ,con44443,,seg-10000,,,,,"FATAL","XX000","failed to load queue and user
>> definition. (resourcemanager.c:496)",,,,,,,0,,"resourcemanager.c",496,"Stack
>> trace:
>> 1 0x8b7038 postgres errstart + 0x288
>> 2 0x8b8dbb postgres elog_finish + 0xab
>> 3 0x951799 postgres ResManagerMainServer2ndPhase + 0x1d9
>> 4 0x951e04 postgres ResManagerMain + 0x534
>> 5 0x952151 postgres ResManagerProcessStartup + 0x171
>> 6 0x78fecb postgres <symbol not found> + 0x78fecb
>> 7 0x792939 postgres PostmasterMain + 0x759
>> 8 0x4a15af postgres main + 0x50f
>> 9 0x7f53296a1b15 libc.so.6 __libc_start_main + 0xf5
>> 10 0x4a162d postgres <symbol not found> + 0x4a162d
>>
>> Some properties of hawq-site.xml that I use.
>>
>> hawq_rm_memory_limit_perseg=8GB
>> hawq_rm_nvcore_limit_perseg=4
>> default_hash_table_bucket_number=6
>>
>> And lastly, the gp_segment_configuration table does not contain the
>> segments, only the master node.
>>
>> Any thoughts are really appreciated.
>>
>>
>> []'s,
>> [image: photo] *Sebastião M. P. Gonella*
>>
>> Mobile: 61-984021512
>> Email: sebastiao.gonella@gmail.com
>> Skype: segonella
>> <#m_7306541628110140158_m_4997655993411867239_m_2751700955444586072_m_-8418648329945005482_UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_>
>>
>>
>
>
> --
> Best Regards,
> Xiang Sheng
>

Mime
View raw message