incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <>
Subject Re: SPARQL 1.1 Update in ARQ 2.8.7
Date Thu, 03 Feb 2011 14:37:07 GMT

On 01/02/11 16:36, Stephen Allen wrote:
> Andy,
> I have started implementing the serializer (SinkBindingOutput) by using
> org.openjena.riot.SinkQuadOutput as a guide and using OutputLangUtils to
> print out the variable/values.  I created the deserializer (LangBindings) by
> extending org.openjena.riot.lang.LangNTuple.  I'm using the paired var/value
> format you described below.  For now I'll start with a straightforward
> implementation with no compression, but like your ideas in this area.  I'll
> try to do some measurements to see if any other compression is beneficial.

Sounds good.

> I did not define an org.openjena.riot.Lang enum for the deserializer
> (because it isn't an RDF language) but I was planning on putting the
> LangBindings class in the org.openjena.riot.lang package.

As good a place as any at the moment.

I've just digging out some code that does tuple I/O from an 
experiemental system a while ago (a clustered query engine ..).

> For determining when to spill bindings to disk, there are a few options (in
> order of least difficulty):
> 1) Store binding objects in an list, and then spill them to disk once the
> list size passes a threshold
> 2) Start serializing bindings immediately into something like
> DeferredFileOutputStream [1] that will retain the data in memory until it
> passes a memory threshold
> 3) Do 1), but try to calculate the size of the bindings in memory and use a
> memory threshold instead of a number of bindings threshold
> I think 1) should be sufficient if we come up with a reasonable guess for
> the threshold.  Option 2) lets you get much better control over the memory
> management, but I think the cost of unnecessarily serializing/deserializing
> small queries may be too high.

Persoanlly, I'd encapsulate this in a policy object and have different 
implementations.  Well, may just one implementation - case 1 with a 
settable threshold for testing.  (3) then becomes a smarter policy 
object to be done later, if needed.

I share your concern on (2) about the serialization to memory costs.


View raw message