zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 杨克特 <kete.yan...@aliyun-inc.com>
Subject The C Client cause core dump when create node
Date Fri, 24 Jun 2011 08:56:30 GMT
As mentioned in BUG-624: https://issues.apache.org/jira/browse/ZOOKEEPER-624
The C Client cause core dump when receive error data from Zookeeper 
Server. And the bug seems didn't fix well. The gdb information is like:

do_io thread:
#0 0x00000039fb030265 in raise () from /lib64/libc.so.6
#1 0x00000039fb031d10 in abort () from /lib64/libc.so.6
#2 0x00000039fb06a84b in __libc_message () from /lib64/libc.so.6
#3 0x00000039fb0722ef in _int_free () from /lib64/libc.so.6
#4 0x00000039fb07273b in free () from /lib64/libc.so.6
#5 0x00002b0afd755dd1 in deallocate_String (s=0x5a490f40) at 
src/recordio.c:29
#6 0x00002b0afd754ade in zookeeper_process (zh=0x131e3870, events=<value 
optimized out>) at src/zookeeper.c:2071
#7 0x00002b0afd75b2ef in do_io (v=<value optimized out>) at 
src/mt_adaptor.c:310
#8 0x00000039fb8064a7 in start_thread () from /lib64/libpthread.so.0
#9 0x00000039fb0d3c2d in clone () from /lib64/libc.so.6

create_node thread:
#0 0x00000039fb80ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x00002b0afd75af5c in wait_sync_completion (sc=0x131e4c90) at 
src/mt_adaptor.c:82
#2 0x00002b0afd751750 in zoo_create (zh=0x131e3870, path=0x13206fa8 
"/jsq/zr2/hb/10.250.8.139:8102",
value=0x131e86a8 
"\n\021\061\060.250.8.139:8102\022\035/home/shaoqiang/workdir2/qrs/\030\001 
\001*%\n\020\n",
valuelen=102, acl=0x2b0afd961700, flags=1, path_buffer=0x0, 
path_buffer_len=0) at src/zookeeper.c:3028


The source of zookeeper.c:
case COMPLETION_STRING:
     LOG_DEBUG(("Calling COMPLETION_STRING for xid=%#x rc=%d",
                cptr->xid, rc));
     if (rc == 0) {
         struct CreateResponse res;
         int len;
         deserialize_CreateResponse(ia, "reply", &res);
         len = strlen(res.path) + 1;
         if (len > sc->u.str.str_len) {
             len = sc->u.str.str_len;
         }
         if (len > 0) {
             memcpy(sc->u.str.str, res.path, len - 1);
             sc->u.str.str[len - 1] = '\0';
         }
         deallocate_CreateResponse(&res);   (this cause core dump)
     }
     break;

The source of recordio.c:
int ia_deserialize_string(struct iarchive *ia, const char *name, char **s)
{
     struct buff_struct *priv = ia->priv;
     int32_t len;
     int rc = ia_deserialize_int(ia, "len", &len);
     if (rc < 0)
         return rc;
     if ((priv->len - priv->off) < len) {
         return -E2BIG;
     }
     if (len < 0) {
         return -EINVAL;
     }
     *s = malloc(len+1);
     if (!*s) {
         return -ENOMEM;
     }
     memcpy(*s, priv->buffer+priv->off, len);
     (*s)[len] = '\0';
     priv->off += len;
     return 0;
}

the variable len is set by ia_deserialize_int, and the returned value is 
-1. (Why server returned -1? It should be the length of the path the 
client just created. If the create operation in server didn't sucessed, 
the error code returned by server should be non zero, but actually the 
error code in reply header is zero.) So *s = malloc(len+1) is never 
done. In deallocate_CreateResponse, res->path isn't initialized but we 
try to free it.

It seems zookeeper server also has some bugs.
In DataTree.java, the function: public ProcessTxnResult 
processTxn(TxnHeader header, Record txn)

try {
     rc.clientId = header.getClientId();
     rc.cxid = header.getCxid();
     rc.zxid = header.getZxid();
     rc.type = header.getType();
     rc.err = 0;
     if (rc.zxid > lastProcessedZxid) {
         lastProcessedZxid = rc.zxid;
     }
     switch (header.getType()) {
         case OpCode.create:
             CreateTxn createTxn = (CreateTxn) txn;
             debug = "Create transaction for " + createTxn.getPath();
             createNode(
                     createTxn.getPath(),
                     createTxn.getData(),
                     createTxn.getAcl(),
                     createTxn.getEphemeral() ? header.getClientId() : 0,
                     header.getZxid(), header.getTime());
             rc.path = createTxn.getPath();
             break;

What if createNode throws out an exception? The operation didn't 
successes, but the rc.err didn't change, it had been set to zero before 
we actually do something.

By the way, this core dump is hard to represent, and I guess the bad 
network may be one of the reasons.


Mime
View raw message