commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Sicker <boa...@gmail.com>
Subject Re: Immutable builder pattern for parsers?
Date Thu, 09 Feb 2017 03:27:36 GMT
I've always considered builders to be mutable, thread unsafe objects used
to create immutable, thread safe objects. If these builders cause GC
pressure to go too high, then I'd turn to object pooling or per-thread
reusable objects depending on how the code is used.

On 8 February 2017 at 20:38, Stian Soiland-Reyes <stain@apache.org> wrote:

> Peter Ansell raised a valid question about why in Commons RDF I made the
> RDFParser interface as an immutable factory-like builder pattern.
>
> https://commons.apache.org/proper/commons-rdf/apidocs/
> org/apache/commons/rdf/experimental/RDFParser.html
>
>
> Here is how it can be used today:
>
>  RDF rdf = new JenaRDF();
>  Graph g1 = rdf.createGraph();
>  Future future = new JenaRDFParser()
>          .source(Paths.get("/tmp/graph.ttl"))
>          .contentType(RDFSyntax.TURTLE)
>          .target(g1).parse();
>  future.get(30, TimeUnit.Seconds); // block
>
>
> You may notice it is not a factory as-such, as you only get a Future
> of the parser result, rather than the actual "parser". Also it is
> currently an interface, with multiple implementations (one per RDF
> implementation).
>
>
>
> Although the javadoc only suggests for it to be immutable and
> threadsafe, the abstract class AbstractRDFParser
> https://commons.apache.org/proper/commons-rdf/apidocs/
> org/apache/commons/rdf/simple/experimental/AbstractRDFParser.html
> https://github.com/apache/commons-rdf/blob/master/
> simple/src/main/java/org/apache/commons/rdf/simple/experimental/
> AbstractRDFParser.java
> is both, and thus every "option setter" method therefore return a cloned
> copy of the factory with the mutated fields set there. This simply uses
> .clone() at the moment.
>
> This abstract class and immutability pattern is used by each of the
> three RDF implementations of the interface.
>
>
> This means it's safe to parse many files like this:
>
>  JenaRDFParser parser = new JenaRDFParser()
>          .contentType(RDFSyntax.TURTLE)
>          .target(g1);
>
>  // Parse multiple files in parallell
>  Future future1 = parser.source(Paths.get("/tmp/graph1.ttl")).parse();
>  Future future2 = parser.source(Paths.get("/tmp/graph2.ttl")).parse();
>
>  // Wait for both to finish
>  future1.get(30, TimeUnit.Seconds);
>  future2.get(30, TimeUnit.Seconds);
>
>
> Above the second call to parser.source(..) is not a problem (even if
> it happens in another thread) as both .source() calls return separate
> cloned JenaRDFParser instances and don't mutate the "parser" instance.
>
>
> Now a valid critique of such a pattern is that it's wasteful to make
> clones that are immediately thrown away, for instance the above
> future1/future2 example would cause 7 instantiations of JenaRDFParser()
> (.parse() does some finalization in a final clone).
>
> I have not done any benchmarking, but envision that as parsing RDF files
> generally creates many thousands of RDFTerm instances, a couple of
> extra parser instantiations shouldn't make a big dent in the bill.
>
>
> The typical builder pattern is rather to return "this" and mutate fields
> directly. This is obviously not thread-safe but generally achives the
> same. Multiple threads would however instead have to create a brand new
> parser from start and set all the options themselves.  (Or they could a
> similar .clone() method from a carefully locked down "proto" instance
> that everyone has to be careful not to call any mutating methods on)
>
>
> Now multiple threads is now very common because of Java 8 streams, and
> for instance it would not be unthinkable that a parallelStream() is
> mapped as a .source() or .target() on such a parser builder. The Futures
> fit somewhat into this picture as well.   I think this would get
> complicated with having to set everything again rather than just pass
> "parser::source" as a method reference.
>
>
>
>
> As this was just thrown together as an experiment (and we've not yet
> tried writing the corresponding RDFWriter interface), I think we could
> revisit how to do this pattern, hopefully drawing on the wider
> experience of the Apache Commons developers on this list.
>
>
>
> Roughly here are my requirements:
>
> * Ability to set various options (e.g. path of file to parse, format,
>   destination) in any order
> * Avoid mega-constructors/multi-arg methods
> * Some options take multiple types, e.g a source can be given as a Path,
>   InputStream or IRI (URL) -- but would override each other
>
> Nice to have:
>
> * Reusable (e.g. set options that are common, but modify
>   source() to parse multiple files)
> * Thread-safe (e.g. immediate reuse of builder before a parsing session
>   has started or finished)
> * Easy to serialize / keep around (should not keep references to big
>   objects in memory uneccessarily)
> * Parsing in the background, e.g. return Future. (implementation manages
>   threads/queues as it pleases)
>
> Unclear:
>
> * How to select between the multiple RDFParser instances?
>   I was thinking the RDF interface could have a .makeParser() method,
>   but not all parsers can take all formats, so possibly some kind of
>   registry or "what can you handle" mechanism might be needed.
>
>
> How do you think we should proceed? Mutable or immutable? How should the
> settings be kept? Fields, map, or what?  Does it make sense with an
> interface, abstract class (keeps settings) and implementations
> (processess settings), or should we have a single ParserFactory class
> and have a new internal interface below?
>
>
> --
> Stian Soiland-Reyes
> University of Manchester
> http://www.esciencelab.org.uk/
> http://orcid.org/0000-0001-9842-9718
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
Matt Sicker <boards@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message