geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Darrel Schneider (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GEODE-482) deserialization can hang for one minute waiting for a DataSerializer
Date Fri, 23 Oct 2015 18:43:27 GMT
Darrel Schneider created GEODE-482:
--------------------------------------

             Summary: deserialization can hang for one minute waiting for a DataSerializer
                 Key: GEODE-482
                 URL: https://issues.apache.org/jira/browse/GEODE-482
             Project: Geode
          Issue Type: Bug
          Components: core
            Reporter: Darrel Schneider


If a JVM does not explicitly register a DataSerializer it is going to use but instead relies
and Geode to distribute the DataSerializer to it from another member or server then a race
condition exists that can cause it to wait for 1 minute and fail to find the DataSerializer.

The work around for this is to explicitly register the DataSerializer using a static initializer
or the cache.xml serializer element.

A unit test was intermittently hitting this problem (see GEODE-376) but that test has been
changed to workaround the race in the product.

The race is in this code com.gemstone.gemfire.internal.InternalDataSerializer.getSerializer(int):
    SerializerAttributesHolder sah=idsToHolders.get(idx);
    while (result == null && !timedOut && sah == null) {
      Object o = idsToSerializers.putIfAbsent(idx, marker);
      if (o == null) {
        result = marker.getSerializer();

If getSerializer sees a null "sah" but before it can do the "idsToSerializers.putIfAbsent"
another thread executes this code com.gemstone.gemfire.internal.InternalDataSerializer.register(String,
boolean, SerializerAttributesHolder):
    if (className == null || className.trim().equals("")) {
      throw new IllegalArgumentException("Class name cannot be null or empty.");
    }
    SerializerAttributesHolder oldValue = dsClassesToHolders.putIfAbsent(
        className, holder);
    if (oldValue != null) {
      if (oldValue.getId() != 0 && holder.getId() != 0
          && oldValue.getId() != holder.getId()) {
        throw new IllegalStateException(snip);
     }
    }
    idsToHolders.putIfAbsent(holder.getId(), holder);
    Object ds = idsToSerializers.get(holder.getId());
    if (ds instanceof Marker) {
      synchronized (ds) {
        ((Marker)ds).notifyAll();
      }
    }

So this thread does not see the Marker and does not notify it.
That leaves the first thread stuck on Marker.getSerializer which blocks for 1 minute and then
returns null.

A new test needs to be written that will reliably fail for this bug.
A multi-threaded unit test that uses these two methods would be best.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message