Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B6CD9200B9E for ; Sat, 24 Sep 2016 07:18:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B54F5160AD6; Sat, 24 Sep 2016 05:18:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3EB48160AD0 for ; Sat, 24 Sep 2016 07:18:41 +0200 (CEST) Received: (qmail 34326 invoked by uid 500); 24 Sep 2016 05:18:40 -0000 Mailing-List: contact user-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hawq.incubator.apache.org Delivered-To: mailing list user@hawq.incubator.apache.org Received: (qmail 34307 invoked by uid 99); 24 Sep 2016 05:18:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Sep 2016 05:18:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5E7A41805B0; Sat, 24 Sep 2016 05:18:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id QY17tBFptPxK; Sat, 24 Sep 2016 05:18:31 +0000 (UTC) Received: from mail-it0-f41.google.com (mail-it0-f41.google.com [209.85.214.41]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id E62835F4EB; Sat, 24 Sep 2016 05:18:29 +0000 (UTC) Received: by mail-it0-f41.google.com with SMTP id 15so6398374ita.1; Fri, 23 Sep 2016 22:18:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=MRuz49omuM5ayCyHgC6ooYLlf+Im1+w63HHeqOR2yMY=; b=R0qKrVg47oZAlxl3VJd65JQOmnMDSqq4oM16UbU/59pEPvbprw/L6IBc98K5dDRyRC mZKOfphK7lhFanBH1oW/bmsSlVG+z3yyL4qXBYMTJC1B7aIn4dOV1HGGoS77HRpba0kj WBFbOro0Fzm40PjZe8/Aro40dSpESiAy8goiY15kIcdcEU788e2M5/pEqzJKQnF9swgE m1W6s9YlO1q9YXhVmVVtFJ9iU0xA6MBAoUtV487JPeyYVNej3PyTXnKmPgtDTsiTODOY X6gpCdyVzR+v/pNZP5lXVOcOr4PCE9fWmM2hIDr8DKwy68mJ4Bu23f7f0x0PKzNCQhH1 Bxcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=MRuz49omuM5ayCyHgC6ooYLlf+Im1+w63HHeqOR2yMY=; b=dcwy0iLOtEBWDEre0YBjyIVkP12q/Jg9UerMWZJMtRyFXApEoydQ2kOPAW1Smt5j2C GPAU3rGsgHflPB3EyRk4RPf/6RPptHD1P6/FzM2QsrNHgpA/yGqbGdQqmkgsOhydH3jt NuSMoS8hEKYYkBopVf78hNokwGF18TUxkTWTXRAw+ppPUSUsU8f+Ak9p4wrofmp0v1aj W/tA37+DTEi1O17a5RJI6Gdx65GRXLvRGiXxop8KmLoNyL8XZEIk9tRKFT1aTdRVHsPF Ew8BvCFwlCSM+BX3UgNLQAWoNBpkZMMPUOaayg23xdjdw0Aiky8gmF3iuvIp0/xlXfnP d9pA== X-Gm-Message-State: AA6/9Rmo3VaBLSkcZPv0Af8eLKW/XMAPofhetpz62zZGBAtkNNPTDSsiV6e9rdzCqA70yJiMt/Tviy/85l/ULQ== X-Received: by 10.36.3.201 with SMTP id e192mr7185218ite.34.1474694308625; Fri, 23 Sep 2016 22:18:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.217.3 with HTTP; Fri, 23 Sep 2016 22:18:22 -0700 (PDT) In-Reply-To: References: <12dbff39.6c5b.1574a7ef17a.Coremail.yin.zhb@163.com> <5242a19e.1174a.1574ba290dd.Coremail.yin.zhb@163.com> From: Lirong Jian Date: Sat, 24 Sep 2016 13:18:22 +0800 Message-ID: Subject: Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756) To: user@hawq.incubator.apache.org Cc: "yin.zhb@163.com" , dev Content-Type: multipart/alternative; boundary=001a1144d6f66b2d28053d3a06a8 archived-at: Sat, 24 Sep 2016 05:18:43 -0000 --001a1144d6f66b2d28053d3a06a8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable As it is known, the number of segments should be equal to or greater than the number of URLs in gpfdist external tables. This is a legacy issue inherited from GPDB. We encountered this issue too when working on a cloud data warehouse based on GPDB. We finally needed to find a workaround to address this issue. This restriction maybe be fine with GPDB, since the number of segments is fixed once the system is initialized. However, IMO, this restriction looks an overkill with HAWQ, since one of the significant features of HAWQ is its elastic execution runtime. The number of segments (virtual segments) to run a query is totally dynamic, while the definition of an external table is usually static. I spent some time checking the code related to this restriction, and found that logic of assigning URLs to segments requires one segment taking at most one URL. In HAWQ, Can we eliminate this restriction by relaxing the URL-to-Segment assignment logic? Looking forwards to your guys' comment and advice. Lirong Lirong Jian HashData Inc. 2016-09-23 8:01 GMT+08:00 Hubert Zhang : > +1 with Vineet's comment. > in your statement "insert into call_center select * from ext_call_center;= " > There are two strict restriction of how many virtual segments need to be > started: > 1 the bucket number of hash distributed result relation call_center (must > equal to #vseg) > 2 the number of locations in gpfdist ext table ext_call_center (<=3D #vs= eg) > > You must set the the bucket number of call_center bigger than the the > number of locations in ext_call_center > > Thanks > Hubert > > > On Thu, Sep 22, 2016 at 1:19 AM, Luis Macedo wrote: > >> Also on your location clause you should not reference the same file more >> than one time. >> >> If you want to scale gpfdist process you need to use different range on >> each port for a same server. (Not sure if I explain myself :)) >> >> If you use one gpfdist per server performance should be fine. One gpfdis= t >> process usually can do 250MB/s if underlying infra allows. >> >> Rgds >> >> --- Sent from my Nexus 5x >> >> Em 21 de set de 2016 12:14 PM, "Vineet Goel" >> escreveu: >> >>> Your default_hash_table_bucket_number value is set to 6. Typically, >>> this should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments= , >>> you should set this value to 18. Any time you change this parameter, yo= u >>> should redistribute your HASH distributed tables, if you have any (unle= ss >>> the table DDL has # of buckets defined, I think). >>> >>> Increase default_hash_table_bucket_number to 18 and retry the insert. >>> Since you have 18 ext table URLs, it should work with the change. >>> >>> Thanks >>> Vineet >>> >>> >>> On Wed, Sep 21, 2016 at 12:26 AM =E6=9D=A5=E7=86=8A w= rote: >>> >>>> >>>> My environment is >>>> : 1 master 3 segments >>>> SQL >>>>>>: >>>> CREATE TABLE call_center ( >>>> cc_call_center_sk integer, >>>> cc_call_center_id character varying(16), >>>> cc_rec_start_date date, >>>> cc_rec_end_date date, >>>> cc_closed_date_sk integer, >>>> cc_open_date_sk integer, >>>> cc_name character varying(50), >>>> cc_class character varying(50), >>>> cc_employees integer, >>>> cc_sq_ft integer, >>>> cc_hours character varying(20), >>>> cc_manager character varying(40), >>>> cc_mkt_id integer, >>>> cc_mkt_class character varying(50), >>>> cc_mkt_desc character varying(100), >>>> cc_market_manager character varying(40), >>>> cc_division text, >>>> cc_division_name character varying(50), >>>> cc_company text, >>>> cc_company_name character varying(50), >>>> cc_street_number character varying(10), >>>> cc_street_name character varying(60), >>>> cc_street_type character varying(15), >>>> cc_suite_number character varying(10), >>>> cc_city character varying(60), >>>> cc_county character varying(30), >>>> cc_state text, >>>> cc_zip character varying(10), >>>> cc_country character varying(20), >>>> cc_gmt_offset numeric(5,2), >>>> cc_tax_percentage numeric(5,2) >>>> ) >>>> WITH (appendonly=3Dtrue, orientation=3Dparquet) >>>> DISTRIBUTED BY (cc_call_center_sk); >>>> >>>> CREATE EXTERNAL TABLE ext_call_center (like call_center) >>>> LOCATION ( >>>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', >>>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat') >>>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\'); >>>> >>>> insert into call_center select * from ext_call_center; >>>> >>>> ERROR: External scan error: There are more external files (URLs) than >>>> primary segments that can read them (COptTasks.cpp:1756) >>>> >>>> hawq config >>>>: >>>> >>>> GUC : add_missing_from Value : off >>>> GUC : application_name Value : >>>> GUC : array_nulls Value : on >>>> GUC : authentication_timeout Value : 60 >>>> GUC : backslash_quote Value : safe_encoding >>>> GUC : block_size Value : 32768 >>>> GUC : bonjour_name Value : >>>> GUC : check_function_bodies Value : on >>>> GUC : client_encoding Value : UTF8 >>>> GUC : client_min_messages Value : ERROR >>>> GUC : cpu_index_tuple_cost Value : 0.005 >>>> GUC : cpu_operator_cost Value : 0.0025 >>>> GUC : cpu_tuple_cost Value : 0.01 >>>> GUC : cursor_tuple_fraction Value : 1 >>>> GUC : custom_variable_classes Value : >>>> GUC : DateStyle Value : ISO, MDY >>>> GUC : db_user_namespace Value : off >>>> GUC : deadlock_timeout Value : 1000 >>>> GUC : debug_assertions Value : off >>>> GUC : debug_pretty_print Value : off >>>> GUC : debug_print_parse Value : off >>>> GUC : debug_print_plan Value : off >>>> GUC : debug_print_prelim_plan Value : off >>>> GUC : debug_print_rewritten Value : off >>>> GUC : debug_print_slice_table Value : off >>>> GUC : default_hash_table_bucket_number Value : 6 >>>> GUC : default_statement_mem Value : 128000 >>>> GUC : default_statistics_target Value : 25 >>>> GUC : default_tablespace Value : >>>> GUC : default_transaction_isolation Value : read committed >>>> GUC : default_transaction_read_only Value : off >>>> GUC : dfs_url Value : localhost:8020/hawq >>>> GUC : dynamic_library_path Value : $libdir >>>> GUC : effective_cache_size Value : 16384 >>>> GUC : enable_bitmapscan Value : on >>>> GUC : enable_groupagg Value : on >>>> GUC : enable_hashagg Value : on >>>> GUC : enable_hashjoin Value : on >>>> GUC : enable_indexscan Value : on >>>> GUC : enable_mergejoin Value : off >>>> GUC : enable_nestloop Value : off >>>> GUC : enable_secure_filesystem Value : off >>>> GUC : enable_seqscan Value : on >>>> GUC : enable_sort Value : on >>>> GUC : enable_tidscan Value : on >>>> GUC : escape_string_warning Value : on >>>> GUC : explain_memory_verbosity Value : suppress >>>> GUC : explain_pretty_print Value : on >>>> GUC : extra_float_digits Value : 0 >>>> GUC : filesystem_support_truncate Value : on >>>> GUC : from_collapse_limit Value : 20 >>>> GUC : gp_adjust_selectivity_for_outerjoins Value : on >>>> GUC : gp_analyze_relative_error Value : 0.25 >>>> GUC : gp_autostats_mode Value : NONE >>>> GUC : gp_autostats_on_change_threshold Value : 2147483647 >>>> GUC : gp_backup_directIO Value : off >>>> GUC : gp_backup_directIO_read_chunk_mb Value : 20 >>>> GUC : gp_cached_segworkers_threshold Value : 5 >>>> GUC : gp_command_count Value : 6 >>>> GUC : gp_connections_per_thread Value : 64 >>>> GUC : gp_debug_linger Value : 0 >>>> GUC : gp_dynamic_partition_pruning Value : on >>>> GUC : gp_email_connect_avoid_duration Value : 7200 >>>> GUC : gp_email_connect_failures Value : 5 >>>> GUC : gp_email_connect_timeout Value : 15 >>>> GUC : gp_email_from Value : >>>> GUC : gp_email_smtp_password Value : >>>> GUC : gp_email_smtp_server Value : localhost:25 >>>> GUC : gp_email_smtp_userid Value : >>>> GUC : gp_email_to Value : >>>> GUC : gp_enable_agg_distinct Value : on >>>> GUC : gp_enable_agg_distinct_pruning Value : on >>>> GUC : gp_enable_direct_dispatch Value : on >>>> GUC : gp_enable_fallback_plan Value : on >>>> GUC : gp_enable_fast_sri Value : on >>>> GUC : gp_enable_gpperfmon Value : off >>>> GUC : gp_enable_groupext_distinct_gather Value : on >>>> GUC : gp_enable_groupext_distinct_pruning Value : on >>>> GUC : gp_enable_multiphase_agg Value : on >>>> GUC : gp_enable_predicate_propagation Value : on >>>> GUC : gp_enable_preunique Value : on >>>> GUC : gp_enable_sequential_window_plans Value : on >>>> GUC : gp_enable_sort_distinct Value : on >>>> GUC : gp_enable_sort_limit Value : on >>>> GUC : gp_external_enable_exec Value : on >>>> GUC : gp_external_grant_privileges Value : off >>>> GUC : gp_external_max_segs Value : 64 >>>> GUC : gp_filerep_tcp_keepalives_count Value : 2 >>>> GUC : gp_filerep_tcp_keepalives_idle Value : 60 >>>> GUC : gp_filerep_tcp_keepalives_interval Value : 30 >>>> GUC : gp_force_use_default_temporary_directory Value : off >>>> GUC : gp_gpperfmon_send_interval Value : 1 >>>> GUC : gp_hashjoin_tuples_per_bucket Value : 5 >>>> GUC : gp_idf_deduplicate Value : auto >>>> GUC : gp_interconnect_cache_future_packets Value : on >>>> GUC : gp_interconnect_default_rtt Value : 20 >>>> GUC : gp_interconnect_fc_method Value : LOSS >>>> GUC : gp_interconnect_hash_multiplier Value : 2 >>>> GUC : gp_interconnect_min_retries_before_timeout Value : 100 >>>> GUC : gp_interconnect_min_rto Value : 20 >>>> GUC : gp_interconnect_queue_depth Value : 4 >>>> GUC : gp_interconnect_setup_timeout Value : 7200 >>>> GUC : gp_interconnect_snd_queue_depth Value : 2 >>>> GUC : gp_interconnect_timer_checking_period Value : 20 >>>> GUC : gp_interconnect_timer_period Value : 5 >>>> GUC : gp_interconnect_transmit_timeout Value : 3600 >>>> GUC : gp_interconnect_type Value : UDP >>>> GUC : gp_log_format Value : csv >>>> GUC : gp_max_csv_line_length Value : 1048576 >>>> GUC : gp_max_databases Value : 16 >>>> GUC : gp_max_filespaces Value : 8 >>>> GUC : gp_max_packet_size Value : 8192 >>>> GUC : gp_max_partition_level Value : 0 >>>> GUC : gp_max_plan_size Value : 0 >>>> GUC : gp_max_relations Value : 65536 >>>> GUC : gp_max_tablespaces Value : 16 >>>> GUC : gp_motion_cost_per_row Value : 0 >>>> GUC : gp_num_contents_in_cluster Value : -10000 >>>> GUC : gp_query_context_mem_limit Value : 102400 >>>> GUC : gp_reject_percent_threshold Value : 300 >>>> GUC : gp_reraise_signal Value : on >>>> GUC : gp_role Value : utility >>>> GUC : gp_safefswritesize Value : 0 >>>> GUC : gp_segment_connect_timeout Value : 600 >>>> GUC : gp_segments_for_planner Value : 0 >>>> GUC : gp_session_id Value : -1 >>>> GUC : gp_set_proc_affinity Value : off >>>> GUC : gp_snmp_community Value : public >>>> GUC : gp_snmp_monitor_address Value : >>>> GUC : gp_snmp_use_inform_or_trap Value : trap >>>> GUC : gp_statistics_pullup_from_child_partition Value : on >>>> GUC : gp_statistics_use_fkeys Value : on >>>> GUC : gp_subtrans_warn_limit Value : 16777216 >>>> GUC : gp_temporary_directory_mark_error Value : 0 >>>> GUC : gp_udp_bufsize_k Value : 0 >>>> GUC : gp_vmem_protect_segworker_cache_limit Value : 500 >>>> GUC : gp_workfile_checksumming Value : on >>>> GUC : gp_workfile_compress_algorithm Value : none >>>> GUC : gp_workfile_limit_per_query Value : 0 >>>> GUC : gp_workfile_limit_per_segment Value : 0 >>>> GUC : gpperfmon_port Value : 8888 >>>> GUC : hawq_global_rm_type Value : yarn >>>> GUC : hawq_master_address_host Value : master.bigdata >>>> GUC : hawq_master_address_port Value : 6432 >>>> GUC : hawq_master_temp_directory Value : /tmp >>>> GUC : hawq_metadata_cache_block_capacity Value : 2097152 >>>> GUC : hawq_metadata_cache_check_interval Value : 30 >>>> GUC : hawq_metadata_cache_flush_ratio Value : 0.85 >>>> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05 >>>> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2 >>>> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288 >>>> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7 >>>> GUC : hawq_metadata_cache_refresh_interval Value : 3600 >>>> GUC : hawq_metadata_cache_refresh_max_num Value : 1000 >>>> GUC : hawq_metadata_cache_refresh_timeout Value : 3600 >>>> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn >>>> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup >>>> GUC : hawq_re_cpu_enable Value : off >>>> GUC : hawq_re_cpu_weight Value : 1024 >>>> GUC : hawq_re_memory_overcommit_max Value : 8192 >>>> GUC : hawq_re_vcore_pcore_ratio Value : 1 >>>> GUC : hawq_rm_cluster_report_period Value : 60 >>>> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5 >>>> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2 >>>> GUC : hawq_rm_container_batch_limit Value : 1000 >>>> GUC : hawq_rm_enable_connpool Value : on >>>> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on >>>> GUC : hawq_rm_force_fifo_queuing Value : on >>>> GUC : hawq_rm_master_domain_port Value : 5436 >>>> GUC : hawq_rm_master_port Value : 5437 >>>> GUC : hawq_rm_memory_limit_perseg Value : 480GB >>>> GUC : hawq_rm_min_resource_perseg Value : 2 >>>> GUC : hawq_rm_nocluster_timeout Value : 60 >>>> GUC : hawq_rm_nresqueue_limit Value : 128 >>>> GUC : hawq_rm_nslice_perseg_limit Value : 5000 >>>> GUC : hawq_rm_nvcore_limit_perseg Value : 16 >>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512 >>>> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8 >>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256 >>>> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4 >>>> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6 >>>> GUC : hawq_rm_nvseg_perquery_limit Value : 512 >>>> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6 >>>> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1 >>>> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2 >>>> GUC : hawq_rm_regularize_io_factor Value : 1 >>>> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11 >>>> GUC : hawq_rm_regularize_nvseg_factor Value : 1 >>>> GUC : hawq_rm_regularize_nvseg_max Value : 300 >>>> GUC : hawq_rm_regularize_usage_factor Value : 1 >>>> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25 >>>> GUC : hawq_rm_request_timeoutcheck_interval Value : 1 >>>> GUC : hawq_rm_resource_allocation_timeout Value : 600 >>>> GUC : hawq_rm_resource_idle_timeout Value : 300 >>>> GUC : hawq_rm_respool_test_file Value : >>>> GUC : hawq_rm_return_percent_on_overcommit Value : 10 >>>> GUC : hawq_rm_segment_config_refresh_interval Value : 30 >>>> GUC : hawq_rm_segment_heartbeat_interval Value : 30 >>>> GUC : hawq_rm_segment_heartbeat_timeout Value : 300 >>>> GUC : hawq_rm_segment_port Value : 5438 >>>> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300 >>>> GUC : hawq_rm_session_lease_heartbeat_enable Value : on >>>> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10 >>>> GUC : hawq_rm_session_lease_timeout Value : 180 >>>> GUC : hawq_rm_stmt_nvseg Value : 0 >>>> GUC : hawq_rm_stmt_vseg_memory Value : 128mb >>>> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25 >>>> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050 >>>> GUC : hawq_rm_yarn_app_name Value : hawq >>>> GUC : hawq_rm_yarn_queue_name Value : default >>>> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030 >>>> GUC : hawq_segment_address_port Value : 40000 >>>> GUC : hawq_segment_history_keep_period Value : 365 >>>> GUC : hawq_segment_temp_directory Value : /tmp >>>> GUC : integer_datetimes Value : on >>>> GUC : IntervalStyle Value : postgres >>>> GUC : join_collapse_limit Value : 20 >>>> GUC : krb5_ccname Value : /tmp/postgres.ccname >>>> GUC : krb_caseins_users Value : off >>>> GUC : krb_server_keyfile Value : FILE:/data/pulse2-agent/agents >>>> /agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc >>>> /krb5.keytab >>>> GUC : krb_srvname Value : postgres >>>> GUC : lc_collate Value : C >>>> GUC : lc_ctype Value : C >>>> GUC : lc_messages Value : en_US.utf8 >>>> GUC : lc_monetary Value : en_US.utf8 >>>> GUC : lc_numeric Value : en_US.utf8 >>>> GUC : lc_time Value : en_US.utf8 >>>> GUC : listen_addresses Value : * >>>> GUC : local_preload_libraries Value : >>>> GUC : log_autostats Value : off >>>> GUC : log_connections Value : off >>>> GUC : log_disconnections Value : off >>>> GUC : log_dispatch_stats Value : off >>>> GUC : log_duration Value : off >>>> GUC : log_error_verbosity Value : default >>>> GUC : log_executor_stats Value : off >>>> GUC : log_hostname Value : off >>>> GUC : log_min_duration_statement Value : -1 >>>> GUC : log_min_error_statement Value : error >>>> GUC : log_min_messages Value : warning >>>> GUC : log_parser_stats Value : off >>>> GUC : log_planner_stats Value : off >>>> GUC : log_rotation_age Value : 1440 >>>> GUC : log_rotation_size Value : 0 >>>> GUC : log_statement Value : none >>>> GUC : log_statement_stats Value : off >>>> GUC : log_timezone Value : PRC >>>> GUC : log_truncate_on_rotation Value : off >>>> GUC : maintenance_work_mem Value : 65536 >>>> GUC : master_directory Value : >>>> GUC : max_appendonly_segfiles Value : 262144 >>>> GUC : max_appendonly_tables Value : 10000 >>>> GUC : max_connections Value : 1280 >>>> GUC : max_files_per_process Value : 150 >>>> GUC : max_fsm_pages Value : 200000 >>>> GUC : max_fsm_relations Value : 1000 >>>> GUC : max_function_args Value : 100 >>>> GUC : max_identifier_length Value : 63 >>>> GUC : max_index_keys Value : 32 >>>> GUC : max_locks_per_transaction Value : 128 >>>> GUC : max_prepared_transactions Value : 250 >>>> GUC : max_stack_depth Value : 2048 >>>> GUC : max_work_mem Value : 1024000 >>>> GUC : metadata_cache_testfile Value : >>>> GUC : optimizer Value : on >>>> GUC : optimizer_analyze_root_partition Value : on >>>> GUC : optimizer_minidump Value : onerror >>>> GUC : optimizer_parts_to_force_sort_on_insert Value : 160 >>>> GUC : password_encryption Value : on >>>> GUC : password_hash_algorithm Value : MD5 >>>> GUC : pljava_classpath Value : >>>> GUC : pljava_release_lingering_savepoints Value : off >>>> GUC : pljava_statement_cache_size Value : 0 >>>> GUC : pljava_vmoptions Value : >>>> GUC : port Value : 6432 >>>> GUC : pxf_enable_filter_pushdown Value : on >>>> GUC : pxf_enable_locality_optimizations Value : on >>>> GUC : pxf_enable_stat_collection Value : on >>>> GUC : pxf_remote_service_login Value : >>>> GUC : pxf_remote_service_secret Value : >>>> GUC : pxf_service_address Value : localhost:51200 >>>> GUC : pxf_stat_max_fragments Value : 100 >>>> GUC : random_page_cost Value : 100 >>>> GUC : regex_flavor Value : advanced >>>> GUC : runaway_detector_activation_percent Value : 95 >>>> GUC : search_path Value : "$user",public >>>> GUC : seg_max_connections Value : 3000 >>>> GUC : segment_directory Value : >>>> GUC : seq_page_cost Value : 1 >>>> GUC : server_encoding Value : UTF8 >>>> GUC : server_ticket_renew_interval Value : 43200000 >>>> GUC : server_version Value : 8.2.15 >>>> GUC : server_version_num Value : 80215 >>>> GUC : shared_buffers Value : 4000 >>>> GUC : shared_preload_libraries Value : >>>> GUC : ssl Value : off >>>> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH >>>> GUC : standard_conforming_strings Value : off >>>> GUC : standby_address_host Value : localhost >>>> GUC : statement_timeout Value : 0 >>>> GUC : superuser_reserved_connections Value : 3 >>>> GUC : tcp_keepalives_count Value : 9 >>>> GUC : tcp_keepalives_idle Value : 7200 >>>> GUC : tcp_keepalives_interval Value : 75 >>>> GUC : temp_buffers Value : 1024 >>>> GUC : TimeZone Value : PRC >>>> GUC : timezone_abbreviations Value : Default >>>> GUC : track_activities Value : on >>>> GUC : track_counts Value : off >>>> GUC : transaction_isolation Value : read committed >>>> GUC : transaction_read_only Value : off >>>> GUC : transform_null_equals Value : off >>>> GUC : unix_socket_directory Value : >>>> GUC : unix_socket_group Value : >>>> GUC : unix_socket_permissions Value : 511 >>>> GUC : update_process_title Value : on >>>> GUC : vacuum_cost_delay Value : 0 >>>> GUC : vacuum_cost_limit Value : 200 >>>> GUC : vacuum_cost_page_dirty Value : 20 >>>> GUC : vacuum_cost_page_miss Value : 10 >>>> GUC : vacuum_freeze_min_age Value : 100000000 >>>> GUC : work_mem Value : 51200 >>>> >>>> >>>> >>>> >>>> >>>> At 2016-09-21 13:00:41, "Vineet Goel" wrote: >>>> >>>> Could you please post your SQL DDL statement? How many URLs do you hav= e >>>> in your external table? Also, your HASH dist table - how many buckets = are >>>> defined, if any? Are the # of URLs more than the # of buckets or >>>> default_hash_table_bucket_number value? Perhaps you can attach your >>>> hawq-site.xml file as well. >>>> >>>> Also see: >>>> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html >>>> >>>> Thanks >>>> Vineet >>>> >>>> >>>> On Tue, Sep 20, 2016 at 7:07 PM =E6=9D=A5=E7=86=8A w= rote: >>>> >>>>> Hi,all: >>>>> I am testing hawq 2.0.0 , and I find a problem like this: >>>>> I load data from an external table (created using "like target_table= " >>>>> statement) , >>>>> if the target table was distributed by some column(s), it raise this >>>>> error: >>>>> External scan error: There are more external files (URLs) than >>>>> primary segments that can read them (COptTasks.cpp:1756) >>>>> if the target table was distributed randomly, it works well, >>>>> I don't set any parameter special,does anybody know how to resolve >>>>> this problem? >>>>> thanks a lot. >>>>> >>>> > > > -- > Thanks > > Hubert Zhang > --001a1144d6f66b2d28053d3a06a8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
As it is known, the number of segments should be equal to = or greater than the number of URLs in gpfdist external tables. This is a le= gacy issue inherited from GPDB. We encountered this issue too when working = on a cloud data warehouse based on GPDB. We finally needed to find a workar= ound to address this issue.

This restriction maybe be fi= ne with GPDB, since the number of segments is fixed once the system is init= ialized. However, IMO, this restriction looks an overkill with HAWQ, since = one of the significant features of HAWQ is its elastic execution runtime. T= he number of segments (virtual segments) to run a query is totally dynamic,= while the definition of an external table is usually static.
I spent some time checking the code related to this restriction= , and found that logic of assigning URLs to segments requires one segment t= aking at most one URL.=C2=A0

In HAWQ, Can we elimi= nate this restriction by relaxing the URL-to-Segment assignment logic? Look= ing forwards to your guys' comment and advice.

Lirong

Lirong Jian
HashData Inc.

2016-09-23 8:01 GMT+08:00 Hubert Zhang <hzh= ang@pivotal.io>:
+1 with Vineet's comment.
in your statement "insert into c= all_center select * from ext_call_center;"
There are = two strict restriction of how many virtual segments need to be started:
1 the bucket number of hash distributed result relation=C2=A0call_center = (must equal to #vseg)
2 the number of locations in =C2=A0gpfdis= t ext table=C2=A0ext_call_center (<=3D #vseg)

<= /div>
You must set the=C2=A0the bucket number of=C2=A0call_center=C2=A0bigger than the the number of locations in ext_call_center

Thanks
Hubert


On= Thu, Sep 22, 2016 at 1:19 AM, Luis Macedo <lmacedo@pivotal.io> wrote:

Also on your l= ocation clause you should not reference the same file more than one time.

If you want to scale gpfdist process you need to use differe= nt range on each port for a same server. (Not sure if I explain myself :)) =

If you use one gpfdist per server performance should be fine= . One gpfdist process usually can do 250MB/s if underlying infra allows.

Rgds

--- Sent from my Nexus 5x


Em 21 de set de 2= 016 12:14 PM, "Vineet Goel" <vvineet@apache.org> escreveu:
Your default_hash_= table_bucket_number value is set to 6. Typically, this should be adjus= ted as 6 x #_of_your_segment_hosts. With 3 segments, you should set this va= lue to 18. Any time you change this parameter, you should redistribute your= HASH distributed tables, if you have any (unless the table DDL has # of bu= ckets defined, I think).

Increase default_hash_table_bucket_nu= mber to 18 and retry the insert. Since you have 18 ext table URLs, it = should work with the change.

Thanks
Vine= et


On Wed, Sep 21, 2016 at 12:26 AM =E6=9D=A5=E7=86=8A <yin.zhb@163.com> wrote:
=

My environment is >= ;>>> : 1 master 3 segments
SQL >>>>>>:= =C2=A0
CREATE TABLE call_center (
=C2=A0 =C2=A0 cc_call= _center_sk integer,
=C2=A0 =C2=A0 cc_call_center_id character var= ying(16),
=C2=A0 =C2=A0 cc_rec_start_date date,
=C2=A0 = =C2=A0 cc_rec_end_date date,
=C2=A0 =C2=A0 cc_closed_date_sk inte= ger,
=C2=A0 =C2=A0 cc_open_date_sk integer,
=C2=A0 =C2= =A0 cc_name character varying(50),
=C2=A0 =C2=A0 cc_class charact= er varying(50),
=C2=A0 =C2=A0 cc_employees integer,
=C2= =A0 =C2=A0 cc_sq_ft integer,
=C2=A0 =C2=A0 cc_hours character var= ying(20),
=C2=A0 =C2=A0 cc_manager character varying(40),
=C2=A0 =C2=A0 cc_mkt_id integer,
=C2=A0 =C2=A0 cc_mkt_class ch= aracter varying(50),
=C2=A0 =C2=A0 cc_mkt_desc character varying(= 100),
=C2=A0 =C2=A0 cc_market_manager character varying(40),
=C2=A0 =C2=A0 cc_division text,
=C2=A0 =C2=A0 cc_division_n= ame character varying(50),
=C2=A0 =C2=A0 cc_company text,
=C2=A0 =C2=A0 cc_company_name character varying(50),
=C2=A0 = =C2=A0 cc_street_number character varying(10),
=C2=A0 =C2=A0 cc_s= treet_name character varying(60),
=C2=A0 =C2=A0 cc_street_type ch= aracter varying(15),
=C2=A0 =C2=A0 cc_suite_number character vary= ing(10),
=C2=A0 =C2=A0 cc_city character varying(60),
= =C2=A0 =C2=A0 cc_county character varying(30),
=C2=A0 =C2=A0 cc_s= tate text,
=C2=A0 =C2=A0 cc_zip character varying(10),
= =C2=A0 =C2=A0 cc_country character varying(20),
=C2=A0 =C2=A0 cc_= gmt_offset numeric(5,2),
=C2=A0 =C2=A0 cc_tax_percentage numeric(= 5,2)
)
WITH (appendonly=3Dtrue, orientation=3Dparquet)<= /div>
DISTRIBUTED BY (cc_call_center_sk);

CREA= TE EXTERNAL TABLE ext_call_center (like call_center)
LOCATION (
'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat= 9;, 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', &= #39;gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', 'g= pfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdis= t://segment3:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://s= egment3:9006/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segmen= t2:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:90= 02/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9003/ca= ll_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9005/call_ce= nter_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9006/call_center_= [0-9]*_[0-9]*.dat', 'gpfdist://segment1:9001/call_center_[0-9]= *_[0-9]*.dat', 'gpfdist://segment1:9002/call_center_[0-9]*_[0-= 9]*.dat', 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.d= at', 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat= 9;, 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', &= #39;gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
<= div>FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE = AS E'\\');

insert into call_center select = * from ext_call_center;

ERROR: =C2=A0External scan= error: There are more external files (URLs) than primary segments that can= read them (COptTasks.cpp:1756)

hawq config >&g= t;>>:

GUC : add_missing_from = Value : off
GUC : application_name Value <= /span>:=C2=A0
GUC : = array_nulls Value : on
GUC : authentication_timeout Value : 60<= /div>
GUC : backslash_quot= e Value : safe_encoding
GUC : block_size <= /span>Value : 32768
= GUC : bonjour_name Value :=C2=A0
GUC : check_function_bodies Val= ue : on
GUC : client_encoding Value : UTF8
GUC : client= _min_messages Value : ERROR
GUC : cpu_index_tuple_cost Value : 0.= 005
GUC : cpu_operat= or_cost Value : 0.0025
GUC : cpu_tuple_cost= Value : 0.01
GUC : cursor_tuple_fraction<= span style=3D"white-space:pre-wrap"> Value : 1
GUC= : custom_variable_classes Value :=C2=A0
G= UC : DateStyle Value <= /span>: ISO, MDY
GUC : db_user_namespace Value : off
GUC : deadlock_timeout Value : 1= 000
GUC : debug_asse= rtions Value : off
GUC : debug_pretty_print= Value : off
GUC : debug_print_parse Value : off
GUC <= /span>: debug_print_plan Value= : off
GUC : debug_print_prelim_plan Value = : off
GUC : = debug_print_rewritten Value : off
GUC : debug_print_slice_table Value = : off
GUC : d= efault_hash_table_bucket_number = Value : 6
GUC= : default_statement_mem Value : 128000
GUC= : default_statistics_target = Value : 25
GU= C : default_tablespace Value :=C2=A0
GUC <= /span>: default_transaction_isolation = Value : read committed=
GUC : default_trans= action_read_only Value : off
GUC : dfs_url = Value : localhost:8020= /hawq
GUC : dynamic_= library_path Value : $libdir
GUC : effective_cache_size Value : = 16384
GUC : enable_b= itmapscan Value : on
GUC : enable_groupagg = Value : on
G= UC : enable_hashagg Value : on
GUC = : enable_hashjoin Value : on
GUC : enable_indexscan Value : on
GUC : enable_mergejoin= Value : off
GUC : enable_nestloop = Value : off
GUC : enable_secure_filesystem Value : off
GUC : enable_seqscan Value : on
GUC : enable_sort Value : on
GUC : enable_tidscan Value : on
GUC = : escape_string_warning Value : on
GUC : explain_memory_verbosity Value : suppress
GUC= : explain_pretty_print Value : on
GUC : extra_float_digits Value : 0
GUC := filesystem_support_truncate V= alue : on
GUC : from_collapse_limit Value = : 20
GUC : gp= _adjust_selectivity_for_outerjoins Value : on
GUC : gp_analyze_relative_err= or Value : 0.25
GUC : gp_autostats_mode Value : NONE
GU= C : gp_autostats_on_change_thr= eshold Value : 2147483647
GUC : gp_backup_directIO Value : o= ff
GUC : gp_backup_d= irectIO_read_chunk_mb Val= ue : 20
GUC : gp_cached_segworkers_threshold Value : 5
GUC : gp_command_count Value : 6
GUC : gp_connections_per_thread Value <= /span>: 64
GUC : gp_= debug_linger Value : 0
GUC : gp_dynamic_partition_pruning Value = : on
GUC : gp_email_= connect_avoid_duration Va= lue : 7200
GUC : gp_email_connect_failures Value : 5
GUC : gp_email_connect_timeout V= alue : 15
GUC : gp_email_from Value :=C2=A0
GUC : gp_em= ail_smtp_password Value :=C2=A0
GUC : gp_email_smtp_server Value = : localhost:25
GUC := gp_email_smtp_userid Value :=C2=A0
GUC : gp_email_to Value :=C2= =A0
GUC : gp_enable_= agg_distinct Value : on
GUC : gp_enable_agg_distinct_pruning Value : on
GUC : gp_ena= ble_direct_dispatch Value : on
GUC : gp_enable_fallback_plan Value : on
GUC : gp_enab= le_fast_sri Value : on
GUC : gp_enable_gpperfmon Value : off
GUC : gp_enable_groupe= xt_distinct_gather Value<= span style=3D"white-space:pre-wrap"> : on
GUC : gp_enable_groupext_distinct_pruni= ng Value : on
GUC : gp_enable_multiphase_agg Value : on
GUC : gp_enable_predicate_pro= pagation Value : on
GUC : gp_enable_preunique Value : on
GUC : gp_enable_sequent= ial_window_plans Value : on
GUC : gp_enable_sort_distinct Value : on
GUC : gp_en= able_sort_limit Value : on
GUC : gp_external_enable_exec Value : = on
GUC : gp_external= _grant_privileges Value : off
GUC : gp_external_max_segs Value : = 64
GUC : gp_filerep_= tcp_keepalives_count Valu= e : 2
GUC : gp_filerep_tcp_keepalives_idle Value : 60
GUC : gp_filerep_tcp_keepalives_interval Value : 30
GUC : gp_force_use_defau= lt_temporary_directory Va= lue : off
GUC : gp_gpperfmon_send_interval Value : 1
GUC : gp_hashjoin_tuples_per_bucket Value : 5
GUC : gp_idf_deduplicate Value : auto
GUC : gp_interconnect_cache_future_packets Value : on
GUC : gp_interconnect_d= efault_rtt Value : 20
GUC : gp_interconnect_fc_method Value : LOS= S
GUC : gp_interconn= ect_hash_multiplier Value= : 2
GUC : gp_interconnect_min_retries_befor= e_timeout Value : 100
GUC : gp_interconnect_min_rto Value : 20
GUC : gp_interconnect_= queue_depth Value : 4
GUC : gp_interconnect_setup_timeout Value : 7200
GUC : gp_int= erconnect_snd_queue_depth Value : 2
GUC : gp_interconnect_timer_checking_period Value : 20
GUC : gp_interconnect_timer_period Value : 5=
GUC : gp_interconne= ct_transmit_timeout Value= : 3600
GUC : gp_interconnect_type Value <= /span>: UDP
GUC : gp= _log_format Value : csv
GUC : gp_max_csv_line_length Value : 10= 48576
GUC : gp_max_d= atabases Value : 16
GUC : gp_max_filespaces= Value : 8
G= UC : gp_max_packet_size Value : 8192
GUC <= /span>: gp_max_partition_level Value : 0
GUC : gp_max_plan_size Value : 0
GUC : gp_ma= x_relations Value : 65536
GUC : gp_max_tablespaces Value : 16
GUC : gp_motion_cost_= per_row Value : 0
GUC : gp_num_contents_in_cluster Value : -10000=
GUC : gp_query_cont= ext_mem_limit Value : 102400
GUC : gp_reject_percent_threshold Value : 300
GUC : gp_r= eraise_signal Value : on
GUC : gp_role Value : utility
= GUC : gp_safefswritesize Value : 0
GUC : gp_segment_connect_timeout Value : 600
GUC : gp_segments_for_planner Value : 0
GUC : gp_session_id Value : -1
GUC : gp_set_proc_affinity Value := off
GUC : gp_snmp_c= ommunity Value : public
GUC : gp_snmp_monitor_address Value :=C2= =A0
GUC : gp_snmp_us= e_inform_or_trap Value : trap
GUC : gp_statistics_pullup_from_child_partiti= on Value : on
GUC : gp_statistics_use_fkeys= Value : on
= GUC : gp_subtrans_warn_limit Value : 16777216
GUC : gp_temporary_directory_mark_error Value : 0
GUC : gp_udp_b= ufsize_k Value : 0
GUC : gp_vmem_protect_segworker_cache_limit Value : 500
GUC : gp_workfile_checksumming Va= lue : on
GUC : gp_workfile_compress_algorithm Value : none
GUC = : gp_workfile_limit_per_query = Value : 0
GU= C : gp_workfile_limit_per_segm= ent Value : 0
GUC : gpperfmon_port = Value : 8888
GUC : hawq_global_rm_type Value : yarn
GUC : hawq_master_address_host V= alue : master.bigdata
GUC : hawq_master_address_po= rt Value : 6432
GUC : hawq_master_temp_directory Value : /tmp
GUC : hawq_metadata_cac= he_block_capacity Value : 2097152
GUC : hawq_metadata_cache_check_inte= rval Value : 30
GUC : hawq_metadata_cache_flush_ratio Value = : 0.85
GUC : hawq_me= tadata_cache_free_block_max_ratio= Value : 0.05
GUC : hawq_metadata_cache_fr= ee_block_normal_ratio Val= ue : 0.2
GUC : hawq_metadata_cache_max_hdfs_f= ile_num Value : 524288
GUC : hawq_metadata_cache_reduce_ratio Value = : 0.7
GUC : = hawq_metadata_cache_refresh_interval Value : 3600
=
GUC : hawq_metadata_cache= _refresh_max_num Value : 1000
GUC : hawq_metadata_cache_refresh_timeo= ut Value : 3600
GUC : hawq_re_cgroup_hierarchy_name Value : hadoo= p-yarn
GUC : hawq_re= _cgroup_mount_point Value : /sys/fs/cgroup
GUC : hawq_re_cpu_enable Value : off
GUC : hawq_re_cpu_weight Value : 1024
GUC : hawq_re_memory_overcommit_max Value : 8192
GUC : hawq_re_vcore_pcore_ratio Value : 1
GUC : hawq_rm_cluster_report_period Value : 60
GUC = : hawq_rm_clusterratio_core_to_memorygb_factor Value <= /span>: 5
GUC : hawq= _rm_connpool_sameaddr_buffersize = Value : 2
GU= C : hawq_rm_container_batch_li= mit Value : 1000
GUC : hawq_rm_enable_connpool Value : on
GUC : hawq_rm_force_alterqu= eue_cancel_queued_request Value : on
GUC : hawq_rm_force_fifo_queuing Value : on
GUC : hawq_rm_master_domain_port Value : 5436
GUC= : hawq_rm_master_port Value : 5437
GUC : hawq_rm_memory_limit_perseg Value : 480GB
G= UC : hawq_rm_min_resource_pers= eg Value : 2
GUC : hawq_rm_nocluster_timeout Value : 60
GUC : hawq_rm_nresqueue_limit= Value : 128
GUC : hawq_rm_nslice_perseg_limit Value : 5000
=
GUC : hawq_rm_nvcore_limi= t_perseg Value : 16
GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
GUC <= /span>: hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
GUC : haw= q_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
GUC : hawq_rm_n= vseg_for_copy_from_perquery Value : 6
GUC : hawq_rm_nvseg_perquery_limit Value : 512
GUC= : hawq_rm_nvseg_perquery_perseg_limit Value : = 6
GUC : hawq_rm_nvse= g_variance_amon_seg_limit Value : 1
GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
GUC : hawq_rm_regularize_io_factor Value : 1
GUC : hawq_r= m_regularize_io_max Value : 1.37439e+11
GUC : hawq_rm_regularize_nvseg_factor Value : 1
GUC : hawq_rm_regularize_nvseg_max Value : 300
=
GUC : hawq_rm_regularize_= usage_factor Value : 1
GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
GUC : hawq_rm_request_timeoutcheck_interval Value : 1
GUC : hawq_rm_resourc= e_allocation_timeout Valu= e : 600
GUC : hawq_rm_resource_idle_timeout Value : 300
GUC : hawq_rm_respool_test_file Value :=C2=A0
GUC= : hawq_rm_return_percent_on_o= vercommit Value : 10
GUC : hawq_rm_segment_config_refresh_interval Value : 30
GUC = : hawq_rm_segment_heartbeat_interval Value : 30<= /div>
GUC : hawq_rm_segmen= t_heartbeat_timeout Value= : 300
GUC : hawq_rm_segment_port Value : 5438
GUC : ha= wq_rm_segment_tmpdir_detect_interval Value : 300
<= div>GUC : hawq_rm_session_leas= e_heartbeat_enable Value<= span style=3D"white-space:pre-wrap"> : on
GUC : hawq_rm_session_lease_heartbeat_i= nterval Value : 10
GUC : hawq_rm_session_lease_timeout Value : 18= 0
GUC : hawq_rm_stmt= _nvseg Value : 0
GUC : hawq_rm_stmt_vseg_memory Value : 128mb
GUC : hawq_rm_tolerate_= nseg_limit Value : 0.25
GUC : hawq_rm_yarn_address Value : worker= 1.bigdata:8050
GUC := hawq_rm_yarn_app_name Value : hawq
GUC : hawq_rm_yarn_queue_name Value = : default
GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:80= 30
GUC : hawq_segmen= t_address_port Value : 40000
GUC : hawq_segment_history_keep_period Value : 365
GUC : hawq_segment_temp_directory Value : /tmp
GUC= : integer_datetimes Value : on
GUC : IntervalStyle Value : postgres
GUC : join_collapse_limit Value : 20
GUC : krb5_c= cname Value : /tmp/postgres.ccname
GUC : krb_caseins_users Value : off
GUC : krb_ser= ver_keyfile Value : FILE:/data/pulse2-agent/agents/ag= ent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.ke= ytab
GUC : krb_srvna= me Value : postgres
GUC : lc_collate Value : C
GUC : lc_ctype Value : = C
GUC : lc_messages<= span style=3D"white-space:pre-wrap"> Value : en_US.utf8
GUC : lc_monetary Value : en_US.utf8
GUC : lc_numeric Value : en_US.utf8
GUC = : lc_time Value : en_US.utf8
GUC : listen_addresses Value : *
GUC : local_pre= load_libraries Value :=C2=A0
GUC : log_autostats Value : off
GUC : log_connections Value : off
GUC= : log_disconnections = Value : off
GUC : log_dispatch_stats Value = : off
GUC : = log_duration Value : off
GUC : log_error_verbosity Value : defau= lt
GUC : log_executo= r_stats Value : off
GUC : log_hostname Value : off
GUC<= span style=3D"white-space:pre-wrap"> : log_min_duration_statement Value : -1
GUC= : log_min_error_statement Value : error
G= UC : log_min_messages Value : warning
GUC = : log_parser_stats Valu= e : off
GUC : log_planner_stats Value : off
GUC : log_r= otation_age Value : 1440
GUC : log_rotation_size Value : 0
GUC : log_statement Value : none
GUC = : log_statement_stats = Value : off
GUC : log_timezone Value : PRC
GUC : log_tr= uncate_on_rotation Value : off
GUC : maintenance_work_mem Value := 65536
GUC : master_= directory Value :=C2=A0
GUC : max_appendonly_segfiles Value : 262= 144
GUC : max_append= only_tables Value : 10000
GUC : max_connections Value : 1280
GUC : max_files_per_pr= ocess Value : 150
GUC : max_fsm_pages Value : 200000
GU= C : max_fsm_relations Value : 1000
GUC : max_function_args Value<= span style=3D"white-space:pre-wrap"> : 100
GUC : max_identifier_length Value : 63
GUC : max_= index_keys Value : 32
GUC : max_locks_per_transaction Value : 128=
GUC : max_prepared_= transactions Value : 250
GUC : max_stack_depth Value : 2048
GUC : max_work_mem Value : 1024000
GUC : metadata_cache_testfile <= /span>Value :=C2=A0
= GUC : optimizer Value = : on
GUC : op= timizer_analyze_root_partition <= /span>Value : on
GUC= : optimizer_minidump Value : onerror
GUC = : optimizer_parts_to_force_sort_on_insert Value = : 160
GUC : password= _encryption Value : on
GUC : password_hash_algorithm Value : MD= 5
GUC : pljava_class= path Value :=C2=A0
GUC : pljava_release_lingering_savepoints Value = : off
GUC : p= ljava_statement_cache_size Val= ue : 0
GUC : pljava_vmoptions Value :=C2=A0
GUC : port<= span style=3D"white-space:pre-wrap"> Value : 6432
GUC : pxf_enable_filter_pushdown Value : on
GUC : pxf_enable_locality_op= timizations Value : on
GUC : pxf_enable_stat_collection Value : on
GUC : pxf_rem= ote_service_login Value :=C2=A0
GUC : pxf_remote_service_secret Value :=C2=A0
GUC : p= xf_service_address Value : localhost:51200
GUC : pxf_stat_max_fragments Value : 100
GUC : random_page_cost Value : 100
GUC : regex_flavor Value : ad= vanced
GUC : runaway= _detector_activation_percent Value : 95
GUC : search_path Value : "$user",public
GUC : seg_max_connections = Value : 3000
= GUC : segment_directory Value :=C2=A0
GUC = : seq_page_cost Value : 1
GUC : server_encoding Value : UT= F8
GUC : server_tick= et_renew_interval Value : 43200000
GUC : server_version Value : = 8.2.15
GUC : server_= version_num Value : 80215
GUC : shared_buffers Value : 4000
GUC : shared_preload_li= braries Value :=C2=A0
GUC : ssl Val= ue : off
GUC : ssl_ciphers Value : = ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
GUC : standard_conforming_strings Value := off
GUC : standby_a= ddress_host Value : localhost
GUC : statement_timeout Value : 0=
GUC : superuser_res= erved_connections Value : 3
GUC : tcp_keepalives_count Value : 9<= /div>
GUC : tcp_keepalives= _idle Value : 7200
GUC : tcp_keepalives_interval Value : 75
=
GUC : temp_buffers Value : 1024
GUC <= /span>: TimeZone Value : PRC
GUC : timezone_abbreviations Value := Default
GUC : track= _activities Value : on
GUC : track_counts= Value : off
GUC : transaction_isolation Value : read committed
GUC : transaction_read_only Value : off
GUC : transform_null_eq= uals Value : off
GUC : unix_socket_directory Value :=C2=A0
<= div>GUC : unix_socket_group Value :=C2=A0
GUC : unix_socket_permissions = Value : 511
G= UC : update_process_title Value : on
GUC <= /span>: vacuum_cost_delay Valu= e : 0
GUC : vacuum_cost_limit Value : 200
GUC : vacuum_= cost_page_dirty Value : 20
GUC : vacuum_cost_page_miss Value : 10=
GUC : vacuum_freeze= _min_age Value : 100000000
GUC : work_mem = Value : 51200





At 2016-09-= 21 13:00:41, "Vineet Goel" <vvineet@apache.org> wrote:
Could you please post your SQL DDL statement? How many UR= Ls do you have in your external table? Also, your HASH dist table - how man= y buckets are defined, if any? Are the # of URLs more than the # of buckets= or default_hash_table_bucket_number=C2=A0value? Perhaps you can attac= h your hawq-site.xml file as well.

Also see:=C2=A0
=
http://hdb.docs.pivotal.io/20/datamgmt/loa= d/g-gpfdist-protocol.html

Thanks
= Vineet


On Tue= , Sep 20, 2016 at 7:07 PM =E6=9D=A5=E7=86=8A <yin.zhb@163.com> wrote:
Hi,all:
=C2=A0 =C2=A0 I am testing= hawq 2.0.0 , and I find a problem like this:
=C2=A0I load data f= rom an external table (created using "like target_table" statemen= t) ,
if the target table was distributed by some column(s), it ra= ise this error:
=C2=A0External scan error: There are more externa= l files (URLs) than primary segments that can read them (COptTasks.cpp:1756= )
if the target table was distributed randomly, it works well,
I don't set any parameter special,does anybody know how to reso= lve this problem?
thanks a lot.



<= /div>--
Thanks

Hubert Z= hang

--001a1144d6f66b2d28053d3a06a8--