qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Conway" <acon...@redhat.com>
Subject Re: Review Request 23096: QPID-5855 JAVA Client Can not recieve message with qpid ha cluster "Session exception occured while trying to commit"
Date Thu, 26 Jun 2014 22:27:19 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23096/
-----------------------------------------------------------

(Updated June 26, 2014, 10:27 p.m.)


Review request for qpid, Gordon Sim, rajith attapattu, and Robbie Gemmell.


Bugs: QPID-5855
    https://issues.apache.org/jira/browse/QPID-5855


Repository: qpid


Description
-------

Reproducer and tentative fix.

To reproduce the bug, apply the changes to Spout and Drain but not the changes
to Session.  Run the script below. It should fail after several hundered
messages.

The problem appears to be that the java client sends a tx-commit and then times
out in Session.sync() waiting for completion. When HA is enabled the tx-commit
command is completed asynchronously, it seems like there is a race condition where
the broker is not sending the completion.

The fix on Session (part of this patch) is to force the client to always send an
execution.sync if the command is a tx-commit. With that fix, the reproducer runs
to completion after 100,000 messages.

I'm not completely convinced that the fix is correct.

- is it reasonable to special-case tx-commit like this? It is reasonable to say
  that you can't start a new transaction till you've finished the old one but
  there may be better places to enforce that. It does seem like the client
  is *trying* to sync. The stack trace has a lot of functions called sync(), but
  it seems none of them is actually sending a sync to the broker.

- the tx-commit command *is* marked synchronous. Why isn't the client sending a
  sync (assuming it really isn't)? To me needSync = !m.isSync() suggests that
  the reason we don't send a sync here is because the method is marked
  synchronous and therefore we believe we've sent it somewhere else in the code.

- If we can, it would be better to fix this on the broker rather than depend on
  the client getting it right.  Why isn't the broker eventually sending a
  completion anyway, even without an explicit sync?

================ Reproducer shell script
#!/bin/sh
# Reproducer for Bug 1095849 - JAVA Client Can not recieve message with qpid ha cluster "Session
exception occured while trying to commit"

# Modify these variables to suit yourself:
CHECKOUT=$HOME/qpid		# qpid checkout root (contains qpid/cpp etc.)
BUILD=$HOME/qpid/debug		# C++ build
QPIDD_DIR=/tmp
QPIDD_CONF=$QPIDD_DIR/qpidd.conf

START_QPIDD=1
BUILD_JAVA=1
while getopts "qj" opt; do
    case $opt in
	q) unset START_QPIDD ;;
	j) unset BUILD_JAVA ;;
	*) echo "Bad option: $opt. By default do everything. -q don't start qpidd, -j  don't build
java."; exit 1 ;;
    esac
done

source $BUILD/src/tests/test_env.sh

wait_status() {
    until qpid-ha --config $QPIDD_CONF -b $1 status --expect $2 &> /dev/null; do
	echo wait_status $*
	sleep .5
    done
}

start_cluster() {
    while pkill qpidd; do
	echo "Killing old qpidd"
    done

    cat <<EOF  > $QPIDD_CONF
load-module=$BUILD/src/ha.so
auth=no
#log-enable=trace+:Protocol
log-enable=debug+:HA
ha-cluster=yes
ha-replicate=all
ha-brokers-url=localhost:6000,localhost:6001,localhost:6002
EOF
    for p in 6000 6001 6002; do
	DIR=$QPIDD_DIR/qpidd-$p
	rm -rf $DIR
	mkdir -p $DIR
	qpidd -d --config=$QPIDD_CONF -p $p --log-to-file $DIR/log --data-dir $DIR
    done
    QPID_HA="qpid-ha --config $QPIDD_CONF -b localhost:6000"
    $QPID_HA promote
    wait_status localhost:6000 active
    wait_status localhost:6001 ready
    wait_status localhost:6002 ready
    $QPID_HA status --all
}

test "$START_QPIDD" && start_cluster

pushd $CHECKOUT/qpid/java

test "$BUILD_JAVA" && mvn package dependency:copy-dependencies -DskipTests

EXAMPLE_TARGET=$CHECKOUT/qpid/java/client/example/target
export CLASSPATH="$EXAMPLE_TARGET/classes/:$EXAMPLE_TARGET/dependency/*"
java org.apache.qpid.example.Spout  -b x:y@localhost:6000 -c 100000 'q;{create:always}'

popd


Diffs
-----

  trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Drain.java 1605863

  trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Spout.java 1605863

  trunk/qpid/java/common/src/main/java/org/apache/qpid/transport/Session.java 1605863 

Diff: https://reviews.apache.org/r/23096/diff/


Testing
-------


Thanks,

Alan Conway


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message