hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yin.zhb@163.com" <yin....@163.com>
Subject Re: Re: May I mention some gpload issues?
Date Sat, 09 Apr 2016 02:36:40 GMT

Our db team using greenplum for one year,we have 680g(one billion+ lines) data need to load
to greenplum every day.
we writed a program to call gpload to load data every 10min. every time will load 100000 to
10000000+ lines.
we can accept abount 1/100000 error lines.

our environment:
os version: rhel 6.3
greenplum :
gpload    :

there's some problems we made when using gpload:

1、"line too long" 
this error make gpload failed, even if there is only one line in all my files needed load.
we set "error_limit","segment reject limit"  but not effected.
if i try to find the error line,it is very hard.so we set max_line_length to "1048576"

2、"no partition key"
this also make us headache, 
maybe there is only one line not correct(a delimiter in column not expected), or encoding
not recognized;
this will make gpload failed like problem 1;

3、column too long
this will make gpload failed,too.
we replace all data type to text,to skip this question.

4、in my product environment,when the greenplum cluster got error,logged like this:
fatal","57m01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994,

but gpload and gpfdist process sleeped,not exit.

we visit the gpload.py script,we found there is a problem not considered.

gpload.py load data like this steps:
step1: read_config()
step2: setup_connection() --connect db the first time
step3: read_table_metadata()
step4: read_columns()
step5: read_mapping()
step6: start_gpfdists()
step7: do_method()

finally,it will:
step8: removing temporary data --connect db the second time
step9:killing gpfdist

we find when step8 got error(db was not connected),the process will sleeping.

thanks for visit.

From: Lei Chang
Date: 2016-04-08 07:52
To: user
CC: dev
Subject: Re: May I mention some gpload issues?

please. thanks!


On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yin.zhb@163.com> wrote:

Whan I using gpload in my work, I got some problems on it, 
May I mention some gpload issues here?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message