uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "D. Heinze" <dhei...@gnoetics.com>
Subject CAS serializationWithCompression
Date Tue, 12 Jan 2016 02:06:22 GMT
I'm having a problem with CAS serializationWithCompression.  I am processing
a few million text document on an IBM P8 with 16 physical SMTP 8 cpus, 200GB
RAM, Ubuntu 14.04.3 LTS and IBM Java 1.8.

I run 55 UIMA pipelines concurrently.  I'm using UIMA 2.6.0.

I use serializeWithCompression to save the final state of the processing on
each document to a file for later processing.

However, the size of the serialized CAS just keeps growing.  The size of the
CAS is stable, but the serialized CASes just keep getting bigger. I even
went to creating a new CAS for each process instead of using cas.reset().  I
have also tried writing the serialized CAS to a byte array output stream
first and then to a file, but it is the serializeWithCompression that caused
the size problem not writing the file.

Here's what the code looks like.  Flushing or not flushing does not make a
difference.  Closing or not closing the file output strem does not make a
difference (other than leaking memory).  I've also tried doing
serializeWithCompression with type filtering.  Wanted to try using a Marker,
but cannot see how to do that.  The problem exists regardless of doing 1 or
55 pipelines concurrently.

 

        File fout = new File(documentPath);

        fos = new FileOutputStream(fout);

        org.apache.uima.cas.impl.Serialization.serializeWithCompression(
cas, fos);

        fos.flush();

        fos.close();

        logger.info( "serializedCas size " + cas.size() + " ToFile " +
documentPath);

 

Suggestions will be appreciated.

 

Thanks / Dan

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message