hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 来熊 <yin....@163.com>
Subject 回复:Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
Date Wed, 21 Sep 2016 07:25:27 GMT


My environment is >>>> : 1 master 3 segments
SQL >>>>>>: 
CREATE TABLE call_center (
    cc_call_center_sk integer,
    cc_call_center_id character varying(16),
    cc_rec_start_date date,
    cc_rec_end_date date,
    cc_closed_date_sk integer,
    cc_open_date_sk integer,
    cc_name character varying(50),
    cc_class character varying(50),
    cc_employees integer,
    cc_sq_ft integer,
    cc_hours character varying(20),
    cc_manager character varying(40),
    cc_mkt_id integer,
    cc_mkt_class character varying(50),
    cc_mkt_desc character varying(100),
    cc_market_manager character varying(40),
    cc_division text,
    cc_division_name character varying(50),
    cc_company text,
    cc_company_name character varying(50),
    cc_street_number character varying(10),
    cc_street_name character varying(60),
    cc_street_type character varying(15),
    cc_suite_number character varying(10),
    cc_city character varying(60),
    cc_county character varying(30),
    cc_state text,
    cc_zip character varying(10),
    cc_country character varying(20),
    cc_gmt_offset numeric(5,2),
    cc_tax_percentage numeric(5,2)
)
WITH (appendonly=true, orientation=parquet)
DISTRIBUTED BY (cc_call_center_sk);


CREATE EXTERNAL TABLE ext_call_center (like call_center)
LOCATION (
'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');


insert into call_center select * from ext_call_center;


ERROR:  External scan error: There are more external files (URLs) than primary segments that
can read them (COptTasks.cpp:1756)


hawq config >>>>:


GUC: add_missing_fromValue: off
GUC: application_nameValue: 
GUC: array_nullsValue: on
GUC: authentication_timeoutValue: 60
GUC: backslash_quoteValue: safe_encoding
GUC: block_sizeValue: 32768
GUC: bonjour_nameValue: 
GUC: check_function_bodiesValue: on
GUC: client_encodingValue: UTF8
GUC: client_min_messagesValue: ERROR
GUC: cpu_index_tuple_costValue: 0.005
GUC: cpu_operator_costValue: 0.0025
GUC: cpu_tuple_costValue: 0.01
GUC: cursor_tuple_fractionValue: 1
GUC: custom_variable_classesValue: 
GUC: DateStyleValue: ISO, MDY
GUC: db_user_namespaceValue: off
GUC: deadlock_timeoutValue: 1000
GUC: debug_assertionsValue: off
GUC: debug_pretty_printValue: off
GUC: debug_print_parseValue: off
GUC: debug_print_planValue: off
GUC: debug_print_prelim_planValue: off
GUC: debug_print_rewrittenValue: off
GUC: debug_print_slice_tableValue: off
GUC: default_hash_table_bucket_numberValue: 6
GUC: default_statement_memValue: 128000
GUC: default_statistics_targetValue: 25
GUC: default_tablespaceValue: 
GUC: default_transaction_isolationValue: read committed
GUC: default_transaction_read_onlyValue: off
GUC: dfs_urlValue: localhost:8020/hawq
GUC: dynamic_library_pathValue: $libdir
GUC: effective_cache_sizeValue: 16384
GUC: enable_bitmapscanValue: on
GUC: enable_groupaggValue: on
GUC: enable_hashaggValue: on
GUC: enable_hashjoinValue: on
GUC: enable_indexscanValue: on
GUC: enable_mergejoinValue: off
GUC: enable_nestloopValue: off
GUC: enable_secure_filesystemValue: off
GUC: enable_seqscanValue: on
GUC: enable_sortValue: on
GUC: enable_tidscanValue: on
GUC: escape_string_warningValue: on
GUC: explain_memory_verbosityValue: suppress
GUC: explain_pretty_printValue: on
GUC: extra_float_digitsValue: 0
GUC: filesystem_support_truncateValue: on
GUC: from_collapse_limitValue: 20
GUC: gp_adjust_selectivity_for_outerjoinsValue: on
GUC: gp_analyze_relative_errorValue: 0.25
GUC: gp_autostats_modeValue: NONE
GUC: gp_autostats_on_change_thresholdValue: 2147483647
GUC: gp_backup_directIOValue: off
GUC: gp_backup_directIO_read_chunk_mbValue: 20
GUC: gp_cached_segworkers_thresholdValue: 5
GUC: gp_command_countValue: 6
GUC: gp_connections_per_threadValue: 64
GUC: gp_debug_lingerValue: 0
GUC: gp_dynamic_partition_pruningValue: on
GUC: gp_email_connect_avoid_durationValue: 7200
GUC: gp_email_connect_failuresValue: 5
GUC: gp_email_connect_timeoutValue: 15
GUC: gp_email_fromValue: 
GUC: gp_email_smtp_passwordValue: 
GUC: gp_email_smtp_serverValue: localhost:25
GUC: gp_email_smtp_useridValue: 
GUC: gp_email_toValue: 
GUC: gp_enable_agg_distinctValue: on
GUC: gp_enable_agg_distinct_pruningValue: on
GUC: gp_enable_direct_dispatchValue: on
GUC: gp_enable_fallback_planValue: on
GUC: gp_enable_fast_sriValue: on
GUC: gp_enable_gpperfmonValue: off
GUC: gp_enable_groupext_distinct_gatherValue: on
GUC: gp_enable_groupext_distinct_pruningValue: on
GUC: gp_enable_multiphase_aggValue: on
GUC: gp_enable_predicate_propagationValue: on
GUC: gp_enable_preuniqueValue: on
GUC: gp_enable_sequential_window_plansValue: on
GUC: gp_enable_sort_distinctValue: on
GUC: gp_enable_sort_limitValue: on
GUC: gp_external_enable_execValue: on
GUC: gp_external_grant_privilegesValue: off
GUC: gp_external_max_segsValue: 64
GUC: gp_filerep_tcp_keepalives_countValue: 2
GUC: gp_filerep_tcp_keepalives_idleValue: 60
GUC: gp_filerep_tcp_keepalives_intervalValue: 30
GUC: gp_force_use_default_temporary_directoryValue: off
GUC: gp_gpperfmon_send_intervalValue: 1
GUC: gp_hashjoin_tuples_per_bucketValue: 5
GUC: gp_idf_deduplicateValue: auto
GUC: gp_interconnect_cache_future_packetsValue: on
GUC: gp_interconnect_default_rttValue: 20
GUC: gp_interconnect_fc_methodValue: LOSS
GUC: gp_interconnect_hash_multiplierValue: 2
GUC: gp_interconnect_min_retries_before_timeoutValue: 100
GUC: gp_interconnect_min_rtoValue: 20
GUC: gp_interconnect_queue_depthValue: 4
GUC: gp_interconnect_setup_timeoutValue: 7200
GUC: gp_interconnect_snd_queue_depthValue: 2
GUC: gp_interconnect_timer_checking_periodValue: 20
GUC: gp_interconnect_timer_periodValue: 5
GUC: gp_interconnect_transmit_timeoutValue: 3600
GUC: gp_interconnect_typeValue: UDP
GUC: gp_log_formatValue: csv
GUC: gp_max_csv_line_lengthValue: 1048576
GUC: gp_max_databasesValue: 16
GUC: gp_max_filespacesValue: 8
GUC: gp_max_packet_sizeValue: 8192
GUC: gp_max_partition_levelValue: 0
GUC: gp_max_plan_sizeValue: 0
GUC: gp_max_relationsValue: 65536
GUC: gp_max_tablespacesValue: 16
GUC: gp_motion_cost_per_rowValue: 0
GUC: gp_num_contents_in_clusterValue: -10000
GUC: gp_query_context_mem_limitValue: 102400
GUC: gp_reject_percent_thresholdValue: 300
GUC: gp_reraise_signalValue: on
GUC: gp_roleValue: utility
GUC: gp_safefswritesizeValue: 0
GUC: gp_segment_connect_timeoutValue: 600
GUC: gp_segments_for_plannerValue: 0
GUC: gp_session_idValue: -1
GUC: gp_set_proc_affinityValue: off
GUC: gp_snmp_communityValue: public
GUC: gp_snmp_monitor_addressValue: 
GUC: gp_snmp_use_inform_or_trapValue: trap
GUC: gp_statistics_pullup_from_child_partitionValue: on
GUC: gp_statistics_use_fkeysValue: on
GUC: gp_subtrans_warn_limitValue: 16777216
GUC: gp_temporary_directory_mark_errorValue: 0
GUC: gp_udp_bufsize_kValue: 0
GUC: gp_vmem_protect_segworker_cache_limitValue: 500
GUC: gp_workfile_checksummingValue: on
GUC: gp_workfile_compress_algorithmValue: none
GUC: gp_workfile_limit_per_queryValue: 0
GUC: gp_workfile_limit_per_segmentValue: 0
GUC: gpperfmon_portValue: 8888
GUC: hawq_global_rm_typeValue: yarn
GUC: hawq_master_address_hostValue: master.bigdata
GUC: hawq_master_address_portValue: 6432
GUC: hawq_master_temp_directoryValue: /tmp
GUC: hawq_metadata_cache_block_capacityValue: 2097152
GUC: hawq_metadata_cache_check_intervalValue: 30
GUC: hawq_metadata_cache_flush_ratioValue: 0.85
GUC: hawq_metadata_cache_free_block_max_ratioValue: 0.05
GUC: hawq_metadata_cache_free_block_normal_ratioValue: 0.2
GUC: hawq_metadata_cache_max_hdfs_file_numValue: 524288
GUC: hawq_metadata_cache_reduce_ratioValue: 0.7
GUC: hawq_metadata_cache_refresh_intervalValue: 3600
GUC: hawq_metadata_cache_refresh_max_numValue: 1000
GUC: hawq_metadata_cache_refresh_timeoutValue: 3600
GUC: hawq_re_cgroup_hierarchy_nameValue: hadoop-yarn
GUC: hawq_re_cgroup_mount_pointValue: /sys/fs/cgroup
GUC: hawq_re_cpu_enableValue: off
GUC: hawq_re_cpu_weightValue: 1024
GUC: hawq_re_memory_overcommit_maxValue: 8192
GUC: hawq_re_vcore_pcore_ratioValue: 1
GUC: hawq_rm_cluster_report_periodValue: 60
GUC: hawq_rm_clusterratio_core_to_memorygb_factorValue: 5
GUC: hawq_rm_connpool_sameaddr_buffersizeValue: 2
GUC: hawq_rm_container_batch_limitValue: 1000
GUC: hawq_rm_enable_connpoolValue: on
GUC: hawq_rm_force_alterqueue_cancel_queued_requestValue: on
GUC: hawq_rm_force_fifo_queuingValue: on
GUC: hawq_rm_master_domain_portValue: 5436
GUC: hawq_rm_master_portValue: 5437
GUC: hawq_rm_memory_limit_persegValue: 480GB
GUC: hawq_rm_min_resource_persegValue: 2
GUC: hawq_rm_nocluster_timeoutValue: 60
GUC: hawq_rm_nresqueue_limitValue: 128
GUC: hawq_rm_nslice_perseg_limitValue: 5000
GUC: hawq_rm_nvcore_limit_persegValue: 16
GUC: hawq_rm_nvseg_for_analyze_nopart_perquery_limitValue: 512
GUC: hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limitValue: 8
GUC: hawq_rm_nvseg_for_analyze_part_perquery_limitValue: 256
GUC: hawq_rm_nvseg_for_analyze_part_perquery_perseg_limitValue: 4
GUC: hawq_rm_nvseg_for_copy_from_perqueryValue: 6
GUC: hawq_rm_nvseg_perquery_limitValue: 512
GUC: hawq_rm_nvseg_perquery_perseg_limitValue: 6
GUC: hawq_rm_nvseg_variance_amon_seg_limitValue: 1
GUC: hawq_rm_nvseg_variance_amon_seg_respool_limitValue: 2
GUC: hawq_rm_regularize_io_factorValue: 1
GUC: hawq_rm_regularize_io_maxValue: 1.37439e+11
GUC: hawq_rm_regularize_nvseg_factorValue: 1
GUC: hawq_rm_regularize_nvseg_maxValue: 300
GUC: hawq_rm_regularize_usage_factorValue: 1
GUC: hawq_rm_rejectrequest_nseg_limitValue: 0.25
GUC: hawq_rm_request_timeoutcheck_intervalValue: 1
GUC: hawq_rm_resource_allocation_timeoutValue: 600
GUC: hawq_rm_resource_idle_timeoutValue: 300
GUC: hawq_rm_respool_test_fileValue: 
GUC: hawq_rm_return_percent_on_overcommitValue: 10
GUC: hawq_rm_segment_config_refresh_intervalValue: 30
GUC: hawq_rm_segment_heartbeat_intervalValue: 30
GUC: hawq_rm_segment_heartbeat_timeoutValue: 300
GUC: hawq_rm_segment_portValue: 5438
GUC: hawq_rm_segment_tmpdir_detect_intervalValue: 300
GUC: hawq_rm_session_lease_heartbeat_enableValue: on
GUC: hawq_rm_session_lease_heartbeat_intervalValue: 10
GUC: hawq_rm_session_lease_timeoutValue: 180
GUC: hawq_rm_stmt_nvsegValue: 0
GUC: hawq_rm_stmt_vseg_memoryValue: 128mb
GUC: hawq_rm_tolerate_nseg_limitValue: 0.25
GUC: hawq_rm_yarn_addressValue: worker1.bigdata:8050
GUC: hawq_rm_yarn_app_nameValue: hawq
GUC: hawq_rm_yarn_queue_nameValue: default
GUC: hawq_rm_yarn_scheduler_addressValue: worker1.bigdata:8030
GUC: hawq_segment_address_portValue: 40000
GUC: hawq_segment_history_keep_periodValue: 365
GUC: hawq_segment_temp_directoryValue: /tmp
GUC: integer_datetimesValue: on
GUC: IntervalStyleValue: postgres
GUC: join_collapse_limitValue: 20
GUC: krb5_ccnameValue: /tmp/postgres.ccname
GUC: krb_caseins_usersValue: off
GUC: krb_server_keyfileValue: FILE:/data/pulse2-agent/agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab
GUC: krb_srvnameValue: postgres
GUC: lc_collateValue: C
GUC: lc_ctypeValue: C
GUC: lc_messagesValue: en_US.utf8
GUC: lc_monetaryValue: en_US.utf8
GUC: lc_numericValue: en_US.utf8
GUC: lc_timeValue: en_US.utf8
GUC: listen_addressesValue: *
GUC: local_preload_librariesValue: 
GUC: log_autostatsValue: off
GUC: log_connectionsValue: off
GUC: log_disconnectionsValue: off
GUC: log_dispatch_statsValue: off
GUC: log_durationValue: off
GUC: log_error_verbosityValue: default
GUC: log_executor_statsValue: off
GUC: log_hostnameValue: off
GUC: log_min_duration_statementValue: -1
GUC: log_min_error_statementValue: error
GUC: log_min_messagesValue: warning
GUC: log_parser_statsValue: off
GUC: log_planner_statsValue: off
GUC: log_rotation_ageValue: 1440
GUC: log_rotation_sizeValue: 0
GUC: log_statementValue: none
GUC: log_statement_statsValue: off
GUC: log_timezoneValue: PRC
GUC: log_truncate_on_rotationValue: off
GUC: maintenance_work_memValue: 65536
GUC: master_directoryValue: 
GUC: max_appendonly_segfilesValue: 262144
GUC: max_appendonly_tablesValue: 10000
GUC: max_connectionsValue: 1280
GUC: max_files_per_processValue: 150
GUC: max_fsm_pagesValue: 200000
GUC: max_fsm_relationsValue: 1000
GUC: max_function_argsValue: 100
GUC: max_identifier_lengthValue: 63
GUC: max_index_keysValue: 32
GUC: max_locks_per_transactionValue: 128
GUC: max_prepared_transactionsValue: 250
GUC: max_stack_depthValue: 2048
GUC: max_work_memValue: 1024000
GUC: metadata_cache_testfileValue: 
GUC: optimizerValue: on
GUC: optimizer_analyze_root_partitionValue: on
GUC: optimizer_minidumpValue: onerror
GUC: optimizer_parts_to_force_sort_on_insertValue: 160
GUC: password_encryptionValue: on
GUC: password_hash_algorithmValue: MD5
GUC: pljava_classpathValue: 
GUC: pljava_release_lingering_savepointsValue: off
GUC: pljava_statement_cache_sizeValue: 0
GUC: pljava_vmoptionsValue: 
GUC: portValue: 6432
GUC: pxf_enable_filter_pushdownValue: on
GUC: pxf_enable_locality_optimizationsValue: on
GUC: pxf_enable_stat_collectionValue: on
GUC: pxf_remote_service_loginValue: 
GUC: pxf_remote_service_secretValue: 
GUC: pxf_service_addressValue: localhost:51200
GUC: pxf_stat_max_fragmentsValue: 100
GUC: random_page_costValue: 100
GUC: regex_flavorValue: advanced
GUC: runaway_detector_activation_percentValue: 95
GUC: search_pathValue: "$user",public
GUC: seg_max_connectionsValue: 3000
GUC: segment_directoryValue: 
GUC: seq_page_costValue: 1
GUC: server_encodingValue: UTF8
GUC: server_ticket_renew_intervalValue: 43200000
GUC: server_versionValue: 8.2.15
GUC: server_version_numValue: 80215
GUC: shared_buffersValue: 4000
GUC: shared_preload_librariesValue: 
GUC: sslValue: off
GUC: ssl_ciphersValue: ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
GUC: standard_conforming_stringsValue: off
GUC: standby_address_hostValue: localhost
GUC: statement_timeoutValue: 0
GUC: superuser_reserved_connectionsValue: 3
GUC: tcp_keepalives_countValue: 9
GUC: tcp_keepalives_idleValue: 7200
GUC: tcp_keepalives_intervalValue: 75
GUC: temp_buffersValue: 1024
GUC: TimeZoneValue: PRC
GUC: timezone_abbreviationsValue: Default
GUC: track_activitiesValue: on
GUC: track_countsValue: off
GUC: transaction_isolationValue: read committed
GUC: transaction_read_onlyValue: off
GUC: transform_null_equalsValue: off
GUC: unix_socket_directoryValue: 
GUC: unix_socket_groupValue: 
GUC: unix_socket_permissionsValue: 511
GUC: update_process_titleValue: on
GUC: vacuum_cost_delayValue: 0
GUC: vacuum_cost_limitValue: 200
GUC: vacuum_cost_page_dirtyValue: 20
GUC: vacuum_cost_page_missValue: 10
GUC: vacuum_freeze_min_ageValue: 100000000
GUC: work_memValue: 51200






At 2016-09-21 13:00:41, "Vineet Goel" <vvineet@apache.org> wrote:

Could you please post your SQL DDL statement? How many URLs do you have in your external table?
Also, your HASH dist table - how many buckets are defined, if any? Are the # of URLs more
than the # of buckets or default_hash_table_bucket_number value? Perhaps you can attach your
hawq-site.xml file as well.


Also see: 
http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html



Thanks
Vineet



On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yin.zhb@163.com> wrote:

Hi,all:
    I am testing hawq 2.0.0 , and I find a problem like this:
 I load data from an external table (created using "like target_table" statement) ,
if the target table was distributed by some column(s), it raise this error:
 External scan error: There are more external files (URLs) than primary segments that can
read them (COptTasks.cpp:1756)
if the target table was distributed randomly, it works well,
I don't set any parameter special,does anybody know how to resolve this problem?
thanks a lot.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message