commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Sicker <boa...@gmail.com>
Subject Re: Immutable builder pattern for parsers?
Date Thu, 09 Feb 2017 06:37:03 GMT
Looking more closely, why not use CompleteableFuture instead of Future
since this is Java 8?

Also, the gist of the configuration API you're looking for sounds like a
way to emulate closures in Java. :)

You could also add a freeze method to switch between mutable and immutable
views of the factory internally to support a sort of deferred parameter
setting for thread safety purposes (i.e., cloning mutable objects). For
example:

MockRDFParser parser = new MockRDFParser().foo(foo).bar(bar).freeze();
// at this point, any calls to a mutator will return a mutable clone until
frozen again

parser.path(path).syntax(syntax).sink(sink).parse(); // thread 1
parser.path(path).syntax(syntax).sink(sink).parse(); // thread 2

Now both parsers should still work. For immutable, threadsafe objects, you
don't even have to clone them in the thawed clones. The naming could use
some work, but I think you get the idea.

I'm also kind of thinking the three different source types could be unified
into their own Source kind of class since the use of any of them are
mutually exclusive.

On 8 February 2017 at 23:18, Matt Sicker <boards@gmail.com> wrote:

> I'm not familiar with the code, but it sounds like you're in the early
> stages of a plugin system. The good old BeanFactory.
>
> Another possible way to go about untying thread safe and not thread safe
> parser factory builders would be using naming conventions like setters for
> mutable and withers for immutable (copy on write) APIs.
>
> On 8 February 2017 at 22:58, Gary Gregory <garydgregory@gmail.com> wrote:
>
>> On Wed, Feb 8, 2017 at 7:27 PM, Matt Sicker <boards@gmail.com> wrote:
>>
>> > I've always considered builders to be mutable, thread unsafe objects
>> used
>> > to create immutable, thread safe objects.
>>
>>
>> +1
>>
>> Gary
>>
>>
>> > If these builders cause GC
>> > pressure to go too high, then I'd turn to object pooling or per-thread
>> > reusable objects depending on how the code is used.
>> >
>> > On 8 February 2017 at 20:38, Stian Soiland-Reyes <stain@apache.org>
>> wrote:
>> >
>> > > Peter Ansell raised a valid question about why in Commons RDF I made
>> the
>> > > RDFParser interface as an immutable factory-like builder pattern.
>> > >
>> > > https://commons.apache.org/proper/commons-rdf/apidocs/
>> > > org/apache/commons/rdf/experimental/RDFParser.html
>> > >
>> > >
>> > > Here is how it can be used today:
>> > >
>> > >  RDF rdf = new JenaRDF();
>> > >  Graph g1 = rdf.createGraph();
>> > >  Future future = new JenaRDFParser()
>> > >          .source(Paths.get("/tmp/graph.ttl"))
>> > >          .contentType(RDFSyntax.TURTLE)
>> > >          .target(g1).parse();
>> > >  future.get(30, TimeUnit.Seconds); // block
>> > >
>> > >
>> > > You may notice it is not a factory as-such, as you only get a Future
>> > > of the parser result, rather than the actual "parser". Also it is
>> > > currently an interface, with multiple implementations (one per RDF
>> > > implementation).
>> > >
>> > >
>> > >
>> > > Although the javadoc only suggests for it to be immutable and
>> > > threadsafe, the abstract class AbstractRDFParser
>> > > https://commons.apache.org/proper/commons-rdf/apidocs/
>> > > org/apache/commons/rdf/simple/experimental/AbstractRDFParser.html
>> > > https://github.com/apache/commons-rdf/blob/master/
>> > > simple/src/main/java/org/apache/commons/rdf/simple/experimental/
>> > > AbstractRDFParser.java
>> > > is both, and thus every "option setter" method therefore return a
>> cloned
>> > > copy of the factory with the mutated fields set there. This simply
>> uses
>> > > .clone() at the moment.
>> > >
>> > > This abstract class and immutability pattern is used by each of the
>> > > three RDF implementations of the interface.
>> > >
>> > >
>> > > This means it's safe to parse many files like this:
>> > >
>> > >  JenaRDFParser parser = new JenaRDFParser()
>> > >          .contentType(RDFSyntax.TURTLE)
>> > >          .target(g1);
>> > >
>> > >  // Parse multiple files in parallell
>> > >  Future future1 = parser.source(Paths.get("/tmp/
>> graph1.ttl")).parse();
>> > >  Future future2 = parser.source(Paths.get("/tmp/
>> graph2.ttl")).parse();
>> > >
>> > >  // Wait for both to finish
>> > >  future1.get(30, TimeUnit.Seconds);
>> > >  future2.get(30, TimeUnit.Seconds);
>> > >
>> > >
>> > > Above the second call to parser.source(..) is not a problem (even if
>> > > it happens in another thread) as both .source() calls return separate
>> > > cloned JenaRDFParser instances and don't mutate the "parser" instance.
>> > >
>> > >
>> > > Now a valid critique of such a pattern is that it's wasteful to make
>> > > clones that are immediately thrown away, for instance the above
>> > > future1/future2 example would cause 7 instantiations of
>> JenaRDFParser()
>> > > (.parse() does some finalization in a final clone).
>> > >
>> > > I have not done any benchmarking, but envision that as parsing RDF
>> files
>> > > generally creates many thousands of RDFTerm instances, a couple of
>> > > extra parser instantiations shouldn't make a big dent in the bill.
>> > >
>> > >
>> > > The typical builder pattern is rather to return "this" and mutate
>> fields
>> > > directly. This is obviously not thread-safe but generally achives the
>> > > same. Multiple threads would however instead have to create a brand
>> new
>> > > parser from start and set all the options themselves.  (Or they could
>> a
>> > > similar .clone() method from a carefully locked down "proto" instance
>> > > that everyone has to be careful not to call any mutating methods on)
>> > >
>> > >
>> > > Now multiple threads is now very common because of Java 8 streams, and
>> > > for instance it would not be unthinkable that a parallelStream() is
>> > > mapped as a .source() or .target() on such a parser builder. The
>> Futures
>> > > fit somewhat into this picture as well.   I think this would get
>> > > complicated with having to set everything again rather than just pass
>> > > "parser::source" as a method reference.
>> > >
>> > >
>> > >
>> > >
>> > > As this was just thrown together as an experiment (and we've not yet
>> > > tried writing the corresponding RDFWriter interface), I think we could
>> > > revisit how to do this pattern, hopefully drawing on the wider
>> > > experience of the Apache Commons developers on this list.
>> > >
>> > >
>> > >
>> > > Roughly here are my requirements:
>> > >
>> > > * Ability to set various options (e.g. path of file to parse, format,
>> > >   destination) in any order
>> > > * Avoid mega-constructors/multi-arg methods
>> > > * Some options take multiple types, e.g a source can be given as a
>> Path,
>> > >   InputStream or IRI (URL) -- but would override each other
>> > >
>> > > Nice to have:
>> > >
>> > > * Reusable (e.g. set options that are common, but modify
>> > >   source() to parse multiple files)
>> > > * Thread-safe (e.g. immediate reuse of builder before a parsing
>> session
>> > >   has started or finished)
>> > > * Easy to serialize / keep around (should not keep references to big
>> > >   objects in memory uneccessarily)
>> > > * Parsing in the background, e.g. return Future. (implementation
>> manages
>> > >   threads/queues as it pleases)
>> > >
>> > > Unclear:
>> > >
>> > > * How to select between the multiple RDFParser instances?
>> > >   I was thinking the RDF interface could have a .makeParser() method,
>> > >   but not all parsers can take all formats, so possibly some kind of
>> > >   registry or "what can you handle" mechanism might be needed.
>> > >
>> > >
>> > > How do you think we should proceed? Mutable or immutable? How should
>> the
>> > > settings be kept? Fields, map, or what?  Does it make sense with an
>> > > interface, abstract class (keeps settings) and implementations
>> > > (processess settings), or should we have a single ParserFactory class
>> > > and have a new internal interface below?
>> > >
>> > >
>> > > --
>> > > Stian Soiland-Reyes
>> > > University of Manchester
>> > > http://www.esciencelab.org.uk/
>> > > http://orcid.org/0000-0001-9842-9718
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > > For additional commands, e-mail: dev-help@commons.apache.org
>> > >
>> > >
>> >
>> >
>> > --
>> > Matt Sicker <boards@gmail.com>
>> >
>>
>>
>>
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?
>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&link
>> Code=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>
>>
>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=
>> am2&o=1&a=1617290459>
>> JUnit in Action, Second Edition
>> <https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?
>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&link
>> Code=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22>
>>
>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=
>> am2&o=1&a=1935182021>
>> Spring Batch in Action
>> <https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?
>> ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&link
>> Code=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Bli
>> nk_id%7D%7D%22%3ESpring+Batch+in+Action>
>> <http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=
>> am2&o=1&a=1935182951>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>
> --
> Matt Sicker <boards@gmail.com>
>



-- 
Matt Sicker <boards@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message