hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiang Sheng <xsh...@pivotal.io>
Subject Re: Error - failed to load queue and user definition
Date Thu, 13 Apr 2017 03:39:09 GMT
Using "select * from gp_segment_configuration ; " to check why your 3
segments down.

And your sql to alter resource queue is not correct.
The log said "memorylimit and corelimit must use the same formats to
express resource limit".
Please refer to docs ALTER-RESOURCE-QUEUE
<http://hawq.incubator.apache.org/docs/userguide/2.1.0.0-incubating/reference/sql/ALTER-RESOURCE-QUEUE.html>
. And  Checking Existing Resource Queues
<http://hawq.incubator.apache.org/docs/userguide/2.1.0.0-incubating/resourcemgmt/ResourceQueues.html#topic_lqy_gls_zt>

On Thu, Apr 13, 2017 at 11:14 AM, SebastiĆ£o Gonella <
sebastiao.gonella@gmail.com> wrote:

> Hi all,
>
> Thanks in advance for the support that you will be providing. We have Hawq
> version 2.0 in which we use to manage a few billion of records. In order to
> improve the queries performance we did tune some parameters in Hawq such as
> queue: pg_default, now long Hawq is not working as expected and the data in
> the segments are no longer available.
>
> Probably I should need to create a new user and another queue, but before
> have to fix this and I have no idea, how to do it. Please, I need help.
>
> The modifications made in the queue were:
>
> stn_bi=# alter resource queue pg_default with
> (vseg_resource_quota='mem:8gb');
> stn_bi=# alter resource queue pg_default with (MEMORY_LIMIT_CLUSTER=90);
>
> Now when I start Hawq's cluster, both master and the segments start
> successfully, but executing the hawq state command, the following returns:
>
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--HAWQ
> instance status summary
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
> :gpadmin-[INFO]:------------------------------------------------------
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Master instance =
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> No Standby master defined
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Total segment instance count from config file = 3
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
> :gpadmin-[INFO]:------------------------------------------------------
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Segment Status
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master
> :gpadmin-[INFO]:------------------------------------------------------
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Total segments count from catalog = 0
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Total segment valid (at master) = 0
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Total segment failures (at master) = 3
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Total number of postmaster.pid files missing = 0
> 20170412:13:25:59:535686 hawq_state:big08-hadoop-master:gpadmin-[INFO]:--
> Total number of postmaster.pid files found = 3
>
> In the pg_log files belonging to the master I have the following output
> errors:
>
> 2017-04-12 17:41:51.096671 BRT,,,p17360,th-1671837248,,,,
> 0,con6702,,seg-10000,,,,,"LOG","00000","Clean up handler in message
> server is called.",,,,,,,0,,"rmcomm_MessageServer.c",105,
> 2017-04-12 17:41:51.099905 BRT,,,p8534,th-1671837248,,,,0
> ,,,seg-10000,,,,,"LOG","00000","resourcemanager process (PID 17360)
> exited with exit code 1",,,,,,,0,,"postmaster.c",4726,
> 2017-04-12 17:41:51.105210 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager starts accepting
> resource request. Listening normal socket port 5437. Total listened 1
> FDs.",,,,,,,0,,"resourcemanager.c",2495,
> 2017-04-12 17:41:51.105454 BRT,,,p8534,th-1671837248,,,,0
> ,,,seg-10000,,,,,"LOG","00000","Wait for HAWQ RM
> -1",,,,,,,0,,"resourcemanager.c",421,
> 2017-04-12 17:41:51.105583 BRT,,,p8534,th-1671837248,,,,0
> ,,,seg-10000,,,,,"LOG","00000","HAWQ :: Received signal notification that
> HAWQ RM works now.",,,,,,,0,,"resourcemanager.c",429,
> 2017-04-12 17:41:51.175205 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
> catalog table successfully!",,,,,,,0,,"resourcepool.c",460,
> 2017-04-12 17:41:51.183070 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
> history catalog table successfully, keep period: recent 365
> days.",,,,,,,0,,"resourcepool.c",530,
> 2017-04-12 17:41:51.189397 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","Add a new row into segment
> configuration catalog table,registration order:0, role:m, status:u,
> port:5432, hostname:big08-hadoop-master.stn.intra, address:
> big08-hadoop-master.stn.intra, description:",,,,,,,0,,"resour
> cepool.c",879,
> 2017-04-12 17:41:51.195123 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager successfully
> loaded role specifications.",,,,,,,0,,"resourcemanager.c",1275,
> 2017-04-12 17:41:51.200825 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manger successfully
> loaded resource queue specifications",,,,,,,0,,"resourcemanager.c",1585,
> 2017-04-12 17:41:51.200858 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"WARNING","01000","memorylimit and corelimit
> must use the same formats to express resource limit",,,,,,,0,,"resqueuemanag
> er.c",708,
> 2017-04-12 17:41:51.200874 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"WARNING","01000","Resource manager cannot
> create resource queue with its attributes because memorylimit and corelimit
> must use the same formats to express resource limit",,,,,,,0,,"resourcemanag
> er.c",1828,
> 2017-04-12 17:41:51.200890 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","Resource manager created resource
> queue instance :
> RESQUEUE:ID=9800,Name=pg_root,PARENT=0,LIMIT(MEM=100.000000%,CORE=100.000000%),RATIO=0
> MBPCORE,INUSE(0 MB, 0.000000 CORE),CONN=0,INQUEUE=0.",,,,,,
> ,0,,"resourcemanager.c",1974,
> 2017-04-12 17:41:51.200910 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"WARNING","01000","resource queue cannot parse
> role attribute, cannot find target resource queue
> '6055'",,,,,,,0,,"resqueuemanager.c",2668,
> 2017-04-12 17:41:51.200924 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"WARNING","01000","cannot create user with its
> attributes because cannot find target resource queue
> '6055'",,,,,,,0,,"resourcemanager.c",1995,
> 2017-04-12 17:41:51.200938 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"LOG","00000","failed to load queue and user
> definition.",,,,,,,0,,"resourcemanager.c",1130,
> 2017-04-12 17:41:51.215416 BRT,,,p17367,th-1671837248,,,,
> 0,con6708,,seg-10000,,,,,"FATAL","XX000","failed to load queue and user
> definition. (resourcemanager.c:496)",,,,,,,0,,"resourcemanager.c",496,"Stack
> trace:
> 1 0x8b7038 postgres errstart + 0x288
> 2 0x8b8dbb postgres elog_finish + 0xab
> 3 0x951799 postgres ResManagerMainServer2ndPhase + 0x1d9
> 4 0x951e04 postgres ResManagerMain + 0x534
> 5 0x952151 postgres ResManagerProcessStartup + 0x171
> 6 0x78fecb postgres <symbol not found> + 0x78fecb
> 7 0x792939 postgres PostmasterMain + 0x759
> 8 0x4a15af postgres main + 0x50f
> 9 0x7f3b99553b15 libc.so.6 __libc_start_main + 0xf5
> 10 0x4a162d postgres <symbol not found> + 0x4a162d
>
> trying to access the data from the database I get the following error
> message:
>
> stn_bi=# select * from wd_documento_emissao limit 1;
> WARNING: FD 28 having errors raised. errno 104
> ERROR: failed to register in resource manager, failed to receive content
> (pquery.c:787)
>
> I'm not able to get the previously parameter values And the following
> error message returns :
>
> postgres=# alter resource queue pg_default with (CORE_LIMIT_CLUSTER=50);
> WARNING: FD 45 having errors raised. errno 104
> ERROR: failed to register in resource manager, failed to receive content
> (resqueuecommand.c:364)
>
> Since I was using Hawq as resource manager (hawq_global_rm_type = none), I
> did modify it for Yarn in order to to solve the probable issue, but even
> this without success.
>
> Below the output for the pg_log regarding to the master when managed Yarn:
>
>
> 2017-04-12 21:29:06.915122 BRT,,,p573383,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
> accepted YARN connection arguments : YARN Server big03-hadoop-master:8050
> Scheduler server big03-hadoop-master:8030 Queue default Application name
> hawq, by user:postgres",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",506,
> 2017-04-12 21:29:06.915192 BRT,,,p495836,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","Wait for HAWQ RM
> -1",,,,,,,0,,"resourcemanager.c",421,
> 2017-04-12 21:29:06.915213 BRT,,,p495836,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","HAWQ :: Received signal notification that
> HAWQ RM works now.",,,,,,,0,,"resourcemanager.c",429,
> 2017-04-12 21:29:06.915735 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.915669, p572922, th139994550884096, INFO
> LibYarnClient::heartbeatFunc, goes into exit phase.",,,,,,,,"SysLoggerMain"
> ,"syslogger.c",518,
> 2017-04-12 21:29:06.916126 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.915883, p572922, th139995204463040, INFO
> LibYarnClient::finishJob, join heart-beat thread
> successfully.",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2017-04-12 21:29:06.916395 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.916142, p573383, th139995204463040, INFO
> ApplicationClient session auth method : simple",,,,,,,,"SysLoggerMain"
> ,"syslogger.c",518,
> 2017-04-12 21:29:06.916561 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.916489, p572922, th139995204463040, INFO
> LibYarnClient::finishJob, finish AM for jobId:job_1491961152436_24865,
> finalStatus:1",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2017-04-12 21:29:06.916808 BRT,,,p572922,th745449920,,,,0
> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
> finished job in YARN.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1872,
> 2017-04-12 21:29:06.916834 BRT,,,p572922,th745449920,,,,0
> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
> result of finish yarn application through libYARN
> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",321,
> 2017-04-12 21:29:06.916855 BRT,,,p572922,th745449920,,,,0
> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
> result of disconnecting YARN through libYARN 0",,,,,,,0,,"resourcebroker_LI
> BYARN_proc.c",331,
> 2017-04-12 21:29:06.916868 BRT,,,p572922,th745449920,,,,0
> ,con44167,,seg-10000,,,,,"LOG","00000","YARN mode resource broker goes
> into exit phase.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",163,
> 2017-04-12 21:29:06.923056 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.922993, p573383, th139995204463040, INFO
> ApplicationClient Resource Manager HA is disable.",,,,,,,,"SysLoggerMai
> n","syslogger.c",518,
> 2017-04-12 21:29:06.957206 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.957002, p572539, th139994550884096, INFO
> LibYarnClient::heartbeatFunc, goes into exit phase.",,,,,,,,"SysLoggerMain"
> ,"syslogger.c",518,
> 2017-04-12 21:29:06.957419 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.957284, p572539, th139995204463040, INFO
> LibYarnClient::finishJob, join heart-beat thread
> successfully.",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2017-04-12 21:29:06.958595 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:06.958384, p572539, th139995204463040, INFO
> LibYarnClient::finishJob, finish AM for jobId:job_1491961152436_24827,
> finalStatus:1",,,,,,,,"SysLoggerMain","syslogger.c",518,
> 2017-04-12 21:29:06.958768 BRT,,,p572539,th745449920,,,,0
> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker
> finished job in YARN.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",1872,
> 2017-04-12 21:29:06.958809 BRT,,,p572539,th745449920,,,,0
> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
> result of finish yarn application through libYARN
> 0",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",321,
> 2017-04-12 21:29:06.958892 BRT,,,p572539,th745449920,,,,0
> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker get
> result of disconnecting YARN through libYARN 0",,,,,,,0,,"resourcebroker_LI
> BYARN_proc.c",331,
> 2017-04-12 21:29:06.958930 BRT,,,p572539,th745449920,,,,0
> ,con43939,,seg-10000,,,,,"LOG","00000","YARN mode resource broker goes
> into exit phase.",,,,,,,0,,"resourcebroker_LIBYARN_proc.c",163,
> 2017-04-12 21:29:06.959059 BRT,,,p572539,th745449920,,,,0
> ,con43939,,seg-10000,,,,,"LOG","00000","failed to find proc
> 0x7f531b01b600 in ProcArray",,,,,,,0,,"procarray.c",184,
> 2017-04-12 21:29:06.999484 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
> catalog table successfully!",,,,,,,0,,"resourcepool.c",460,
> 2017-04-12 21:29:07.008804 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","Cleanup segment configuration
> history catalog table successfully, keep period: recent 365
> days.",,,,,,,0,,"resourcepool.c",530,
> 2017-04-12 21:29:07.015893 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","Add a new row into segment
> configuration catalog table,registration order:0, role:m, status:u,
> port:5432, hostname:big08-hadoop-master.stn.intra, address:
> big08-hadoop-master.stn.intra, description:",,,,,,,0,,"resour
> cepool.c",879,
> 2017-04-12 21:29:07.022082 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manager successfully
> loaded role specifications.",,,,,,,0,,"resourcemanager.c",1275,
> 2017-04-12 21:29:07.027941 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manger successfully
> loaded resource queue specifications",,,,,,,0,,"resourcemanager.c",1585,
> 2017-04-12 21:29:07.028062 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"WARNING","01000","memorylimit and corelimit
> must use the same formats to express resource limit",,,,,,,0,,"resqueuemanag
> er.c",708,
> 2017-04-12 21:29:07.028082 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"WARNING","01000","Resource manager cannot
> create resource queue with its attributes because memorylimit and corelimit
> must use the same formats to express resource limit",,,,,,,0,,"resourcemanag
> er.c",1828,
> 2017-04-12 21:29:07.028147 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","Resource manager created resource
> queue instance :
> RESQUEUE:ID=9800,Name=pg_root,PARENT=0,LIMIT(MEM=100.000000%,CORE=100.000000%),RATIO=0
> MBPCORE,INUSE(0 MB, 0.000000 CORE),CONN=0,INQUEUE=0.",,,,,,
> ,0,,"resourcemanager.c",1974,
> 2017-04-12 21:29:07.028171 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"WARNING","01000","resource queue cannot parse
> role attribute, cannot find target resource queue
> '6055'",,,,,,,0,,"resqueuemanager.c",2668,
> 2017-04-12 21:29:07.028185 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"WARNING","01000","cannot create user with its
> attributes because cannot find target resource queue
> '6055'",,,,,,,0,,"resourcemanager.c",1995,
> 2017-04-12 21:29:07.028199 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"LOG","00000","failed to load queue and user
> definition.",,,,,,,0,,"resourcemanager.c",1130,
> 2017-04-12 21:29:07.042672 BRT,,,p495838,th745449920,,,,0
> ,,,seg-10000,,,,,"LOG","00000","3rd party error log:
> 2017-04-12 21:29:07.042514, p573313, th139995204463040, INFO
> ApplicationClient RM Scheduler HA is disable.",,,,,,,,"SysLoggerMai
> n","syslogger.c",518,
> 2017-04-12 21:29:07.043183 BRT,,,p573381,th745449920,,,,0
> ,con44443,,seg-10000,,,,,"FATAL","XX000","failed to load queue and user
> definition. (resourcemanager.c:496)",,,,,,,0,,"resourcemanager.c",496,"Stack
> trace:
> 1 0x8b7038 postgres errstart + 0x288
> 2 0x8b8dbb postgres elog_finish + 0xab
> 3 0x951799 postgres ResManagerMainServer2ndPhase + 0x1d9
> 4 0x951e04 postgres ResManagerMain + 0x534
> 5 0x952151 postgres ResManagerProcessStartup + 0x171
> 6 0x78fecb postgres <symbol not found> + 0x78fecb
> 7 0x792939 postgres PostmasterMain + 0x759
> 8 0x4a15af postgres main + 0x50f
> 9 0x7f53296a1b15 libc.so.6 __libc_start_main + 0xf5
> 10 0x4a162d postgres <symbol not found> + 0x4a162d
>
> Some properties of hawq-site.xml that I use.
>
> hawq_rm_memory_limit_perseg=8GB
> hawq_rm_nvcore_limit_perseg=4
> default_hash_table_bucket_number=6
>
> And lastly, the gp_segment_configuration table does not contain the
> segments, only the master node.
>
> Any thoughts are really appreciated.
>
>
> []'s,
> [image: photo] *SebastiĆ£o M. P. Gonella*
>
> Mobile: 61-984021512
> Email: sebastiao.gonella@gmail.com
> Skype: segonella
> <#m_4997655993411867239_m_2751700955444586072_m_-8418648329945005482_UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter_>
>
>


-- 
Best Regards,
Xiang Sheng

Mime
View raw message