hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lei Chang <lei_ch...@apache.org>
Subject Re: Re: May I mention some gpload issues?
Date Sat, 09 Apr 2016 09:47:15 GMT
Looks you are using gpdb. This mailing list is for HAWQ.

So you can reach pivotal support or ask gpdb questions on the mailing list
shown here: http://greenplum.org/

Cheers
Lei



On Sat, Apr 9, 2016 at 10:36 AM, yin.zhb@163.com <yin.zhb@163.com> wrote:

>
> Our db team using greenplum for one year,we have 680g(one billion+ lines)
> data need to load to greenplum every day.
> we writed a program to call gpload to load data every 10min. every time
> will load 100000 to 10000000+ lines.
> we can accept abount 1/100000 error lines.
>
> our environment:
> os version: rhel 6.3
> greenplum : 4.3.5.2
> gpload    : 4.3.5.2
>
> there's some problems we made when using gpload:
>
> 1、"line too long"
> this error make gpload failed, even if there is only one line in all my
> files needed load.
> we set "error_limit","segment reject limit"  but not effected.
> if i try to find the error line,it is very hard.so we set max_line_length
> to "1048576"
>
> 2、"no partition key"
> this also make us headache,
> maybe there is only one line not correct(a delimiter in column not
> expected), or encoding not recognized;
> this will make gpload failed like problem 1;
>
>
> 3、column too long
> this will make gpload failed,too.
> we replace all data type to text,to skip this question.
>
> 4、in my product environment,when the greenplum cluster got error,logged
> like this:
> fatal","57m01","the database system is in mirror or uninitialized
> mode",,,,,,,0,,"postmaster.c",2994,
>
> but gpload and gpfdist process sleeped,not exit.
>
> we visit the gpload.py script,we found there is a problem not considered.
>
> gpload.py load data like this steps:
> step1: read_config()
> step2: setup_connection() --connect db the first time
> step3: read_table_metadata()
> step4: read_columns()
> step5: read_mapping()
> step6: start_gpfdists()
> step7: do_method()
>
> finally,it will:
> step8: removing temporary data --connect db the second time
> step9:killing gpfdist
>
> we find when step8 got error(db was not connected),the process will
> sleeping.
>
> thanks for visit.
>
>
>
>
>
>
>
>
> yin.zhb@163.com
>
> From: Lei Chang
> Date: 2016-04-08 07:52
> To: user
> CC: dev
> Subject: Re: May I mention some gpload issues?
>
> please. thanks!
>
> Cheers
> Lei
>
>
> On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yin.zhb@163.com> wrote:
>
> Whan I using gpload in my work, I got some problems on it,
> May I mention some gpload issues here?
>
>
>
> yin.zhb@163.com
>
>

Mime
View raw message