hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wales Wang <wormw...@yahoo.com.INVALID>
Subject Re: May I mention some gpload issues?
Date Sat, 09 Apr 2016 09:50:05 GMT
to:yin
I can help u
Pls contact me.
to: lei
greenplum dev is not active developing 

Wales Wang

在 2016-4-9,下午5:47,Lei Chang <lei_chang@apache.org> 写道:

> Looks you are using gpdb. This mailing list is for HAWQ.
> 
> So you can reach pivotal support or ask gpdb questions on the mailing list
> shown here: http://greenplum.org/
> 
> Cheers
> Lei
> 
> 
> 
> On Sat, Apr 9, 2016 at 10:36 AM, yin.zhb@163.com <yin.zhb@163.com> wrote:
> 
>> 
>> Our db team using greenplum for one year,we have 680g(one billion+ lines)
>> data need to load to greenplum every day.
>> we writed a program to call gpload to load data every 10min. every time
>> will load 100000 to 10000000+ lines.
>> we can accept abount 1/100000 error lines.
>> 
>> our environment:
>> os version: rhel 6.3
>> greenplum : 4.3.5.2
>> gpload    : 4.3.5.2
>> 
>> there's some problems we made when using gpload:
>> 
>> 1、"line too long"
>> this error make gpload failed, even if there is only one line in all my
>> files needed load.
>> we set "error_limit","segment reject limit"  but not effected.
>> if i try to find the error line,it is very hard.so we set max_line_length
>> to "1048576"
>> 
>> 2、"no partition key"
>> this also make us headache,
>> maybe there is only one line not correct(a delimiter in column not
>> expected), or encoding not recognized;
>> this will make gpload failed like problem 1;
>> 
>> 
>> 3、column too long
>> this will make gpload failed,too.
>> we replace all data type to text,to skip this question.
>> 
>> 4、in my product environment,when the greenplum cluster got error,logged
>> like this:
>> fatal","57m01","the database system is in mirror or uninitialized
>> mode",,,,,,,0,,"postmaster.c",2994,
>> 
>> but gpload and gpfdist process sleeped,not exit.
>> 
>> we visit the gpload.py script,we found there is a problem not considered.
>> 
>> gpload.py load data like this steps:
>> step1: read_config()
>> step2: setup_connection() --connect db the first time
>> step3: read_table_metadata()
>> step4: read_columns()
>> step5: read_mapping()
>> step6: start_gpfdists()
>> step7: do_method()
>> 
>> finally,it will:
>> step8: removing temporary data --connect db the second time
>> step9:killing gpfdist
>> 
>> we find when step8 got error(db was not connected),the process will
>> sleeping.
>> 
>> thanks for visit.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> yin.zhb@163.com
>> 
>> From: Lei Chang
>> Date: 2016-04-08 07:52
>> To: user
>> CC: dev
>> Subject: Re: May I mention some gpload issues?
>> 
>> please. thanks!
>> 
>> Cheers
>> Lei
>> 
>> 
>> On Thu, Apr 7, 2016 at 11:56 PM, yin.zhb@163.com <yin.zhb@163.com> wrote:
>> 
>> Whan I using gpload in my work, I got some problems on it,
>> May I mention some gpload issues here?
>> 
>> 
>> 
>> yin.zhb@163.com
>> 
>> 

Mime
View raw message