beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Sisk (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-2659) JdbcIOIT flaky when run using io-it-suite-local
Date Sat, 22 Jul 2017 20:46:02 GMT
Stephen Sisk created BEAM-2659:
----------------------------------

             Summary: JdbcIOIT flaky when run using io-it-suite-local
                 Key: BEAM-2659
                 URL: https://issues.apache.org/jira/browse/BEAM-2659
             Project: Beam
          Issue Type: Bug
          Components: testing
            Reporter: Stephen Sisk
            Assignee: Chamikara Jayalath


Note: the problem below *should not* affect io-it-suite and thus the jdbc jenkins job that's
currently in PR. I haven't tested that exact configurations so I'm not 100% certain, but I
don't have any indications that there'll be a problem.
---

 I've been running the postgres kubernetes scripts locally for a while and haven't seen any
problems.

However, now that I'm running it via io-it-suite-local, and I'm starting to see flakiness
- clients attempting to connect to the postgres server will get "connection attempt failed"
error.  The difference between what was working before and now is that now the load balancer
and the pod are getting set up at the same time. Before I was using it with a pre-existing
load balancer - that is I haven't been tearing down/starting up the load balancer+pod at run
time.

So I think the problem is in the interaction between the two or potentially just in the LoadBalancer
service (it may take a little bit longer to get fully hooked up even after it reports an IP)

Possible causes:
* the loadbalancer is reporting it's ready before it actually can serve traffic to the postgres
instance
* the lodabalancer has another status field that I'm not looking at - today we only check
IP address, perhaps the loadbalancer exposes a status field? kubectl get/describe might be
able to help. A cursory examination didn't show anything helpful.
* the postgres instance isn't actually ready when it says it is. I don't think that's the
issue since I was working with postgres pods before and they seemed fine then

Potential solutions: 
* if cause is slow postgres pod start (unlikely): determine postgres pod health by reading
from sql?  (pg_ctl?), and then have pkb wait for that by adding a dynamic_pipeline_option
that wait for the kubernetes status to be okay and sends to a non-existent pipeline option
* file bug about loadbalancer not being ready when it says it is? (investigate that more :)
* have some way for pkb to actually connect to and validate the connection to postgres (that
seems complicated.)

If the problem is that the loadbalancer is not ready when it says it is, while we are waiting
for kubernetes to fix the issue, one workaround would be to:
1) modify io-it-suite-local to not load any kubernetes scripts (set --beam_kubernetes_scripts
equal to blank line or skip it altogether - https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml#L137)
2) have the user run the kubernetes scripts manually beforehand, wait for it to be healthy,
and then run io-it-suite-local 

To repro the problem:
mvn verify -Dio-it-suite-local -pl sdks/java/io/jdbc -DpkbLocation="your-copy-of-PerfKitBenchmarker/pkb.py"
-DintegrationTestPipelineOptions='["--tempRoot=gs://sisk-test/staging"]' -DforceDirectRunner=true

this should fail when run repeatedly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message