stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Future of Clerezza and Stanbol
Date Fri, 09 Nov 2012 09:56:24 GMT
Hi all,

let me share my throughs. Because this mail is rather long I tried to
split it up in three separate section (1) RDF (2) RESTful/ Web
Interface and (3) other related topics


RDF libs:
====

Out of the viewpoint of Apache Stanbol one needs to ask the Question
if it makes sense to manage an own RDF API. I expect the Semantic Web
Standards to evolve quite a bit in the coming years and I do have
concern that the Clerezza RDF modules will be updated/extended to
provide implementations of those. One example of such an situation is
SPARQL 1.1 that is around for quite some time and is still not
supported by Clerezza. While I do like the small API, the flexibility
to use different TripleStores and that Clerezza comes with OSGI
support I think given the current situation we would need to discuss
all options and those do also include a switch to Apache Jena or
Sesame. Especially Sesame would be an attractive option as their RDF
Graph API [1] is very similar to what Clerezza uses. Apache Jena's
counterparts (Model [2] and Graph [3]) are considerable different and
more complex interfaces. In addition Jena will only change to
org.apache packages with the next major release so a switch before
that release would mean two incompatible API changes.

My personal opinion is that we should keep using Clerezza for now.
Invest some effort to improve the Clerezza RDF modules and than see
how it further develops. Such an Effort should include

*  to implement SPQRAL fast lane (as already discussed with Reto
during ApacheCon). Fast lane would allow Clerezza to use the native
SPARQL engine of the used Triplestore. Meaning that Clerezza only
parses those parts of the SPARQL query to understand the RDF graph to
execute the Query on. This information is than used to parse the query
to the native SPARQL engine via an extended Interface of the
TcProvide. The Clerezza SPARQL implementation would only be used in
case the TcProvider does not provide a native SPARQL implementation of
if the Query spans RDF graphs managed by different TcProvider
instances. By that Clerezza users would be able to use any SPARQL
feature provided by the used TripleStore.
* update to the newest Jena versions (see also STANBOL-621; Peter
Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
Jena bundle used for the Stanbol/LMF integration [5])
* finish and release the SingleTdbDatasetTcProvider.java
(CLEREZZA-691) as this is important for the Stanbol Ontology Manager
component
* move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
code base to Clerezza and release it so that we can use it from their
in Stanbol
* provide an Clerezza JsonLD parser/serializer. This is critical for
Stanbol as several CMS use this as preferred RDF serialization.

[1] http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
[2] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
[3] http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
[4] https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
[5] https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml


RESTful API / Web Interface:
=====================

There are several shortcomings of the current implementation of the
Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)

* Jersey's use of java.util.ServiceLoader forces the use manual
configuration of the JAX-RS components. A switch to an OSGI compatible
implementation such as Apache Wink would be very welcome
* The RESTful API documentation is currently written as HTML into
Freemarker templates. This makes it really hard to maintain this
documentation. I would really appreciate the possibility to use
markdown (as used on the Webpage) for that
* For Stanbol deployments of Stanbol it should be possible to exclude
the WebUI so that only the RESTful services are available

regarding :

> Stanbol drops it's interretation of "REST" as "not for humans" and want to go to
> allow integrating (wherever possible as modular and optional components)
> media types designed for human consumptions and support REST approaches
> there as well (thinking of the current back-button unfriendly UI).

Adding support for a simple Table based representation of RDF data
would indeed be an important feature. However having Resource (Entity)
type specific rendering is out of the scope of Apache Stanbol (at
least in my opinion). However AFAIK as soon as we switch to an OSGI
compatible JAX-RS implementation users could add those easily by
providing the according JAX-RS MessageBodyWriter.

If there are people who would like to work it would be really great.
If we could (re)use some stuff from Clerezza - even better. But things
would need to keep simple as Stanbol is no semantic CMS.

I would suggest to start development in an own branch and than have a
discussion/vote based on an early prototype/demonstration.


Other Topics
=========

### Scala and jsr 223 (scripting in the JVM)

I do have an issue with Scala as it adds >150MByte to the PermGen as
soon as it is loaded. But as long as it is an optional dependency and
users are aware of that when adding the dependency I am fine with it.

###  Shell

Personally I do not find the shell very useful. For installing
Bundles/Service configurations I prefer to use the Apache Sling
FileInstaller. For deployment during development I like to use the
Sling Maven Installer plugin. For creating new Stanbol Modules I
rather suggest to create an extensive list of Maven Archetype (e.g.
for Stanbol EnhancementEngines).

As the Shell also depends on Scala the "+150MByte to the PermGen"
issue also applies to the Shell.

### Security

Having a security model in Apache Stanbol might be important for some
use cases. Because of this I consider this an important topic. However
one I have very little experience with.

I would like to get rid of the dependencies to
org.apache.clerezza:patform (AFAIK this is only needed for the
configuration and this could be easily provided by the
sling.properties file at runtime. Defaults can be provided in the
commons.properties file already included in all Stanbol Launchers. I
would also suggest to move the PermissionParser utility over to the
Apache Stanbol Security modules.
This two changes would allow to activate the security module also for
the Stable (Stateless) launcher.


best
Rupert


On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <hasan@trialox.org> wrote:
> Comments inline...
>
> On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gm├╝r <reto@apache.org> wrote:
>
>> Ok, sorry for jumping into this discussion so lately. I've been having
>> quite some discussion on the matter here at apacheconeu. Also I had
>> prositive feedback from my resentation of Clerezza yesterday.
>>
>> I think two things:
>> - For high level platform component it is often not clear if the fit better
>> into Stanbol or into Clerezza
>> - The RDF Api shoud actually be independen both from triple store provider
>> as well as from consumer
>>
>> So I think a good solution would be to have the RDF liraries comprising:
>> - A modular and very spec oriented API for RDF and related standards
>> - A set of serializing and parsing providers
>> - Adapters to triple stores (where the api isn't provided by the triple
>> store)
>> basically that's what in the org.apache.clerezza.rdf.* packages
>>
>> That's the stuff that would fit well into Stanbol. Provided that stanbol
>> drops it's interretation of "REST" as "not for humans" and want to go to
>> allow integrating (wherever possible as modular and optional components)
>> media types designed for human consumptions and support REST approaches
>> there as well (thinking of the current back-button unfriendly UI).
>>
>
> IMO, Clerezza is just too big for existing committers. If we could reduce
> it to the
> essential components dealing with rdf and leaving out templating and
> rendering,
> it may be easier to graduate.
>
> - Scala Server Pages
>> - TypeRendering (selection of templates based on the rdf type of the
>> returned response)
>> - Security (already integrated to some degree, code based security to run
>> bundles in a sandboxed manner is not)
>> - Shell (already ships in the stanbol launcher, so here it's about
>> 'adopting' the sources)
>> - Dev tools: rapid development support (create sample projects, have source
>> files as bundles)
>>
>> To the attic:
>> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
>> same support (jax-rs components asosgi services) is now provided by apache
>> wink
>> -  jssr 223 support
>>
>> In my opinion there is no urgent need for action, it is true that there
>> hasn't been a lot of action in clerezza but imho the project os going on
>> even at a low pace  (as other projects like e.g. the recently graduated
>> wink).
>>
>
> Not sure about no urgent need for action. Maybe we should list the
> requirements
> to fulfil in order to be able to graduate. Wonder if we are able to meet
> them.
>
> Cheers
> Hasan
>
>
>>
>> Cheers,
>> Reto
>>
>> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
>> bdelacretaz@apache.org
>> > wrote:
>>
>> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <andy@apache.org> wrote:
>> > > ...It's good to have the existing released artifacts remain - what
>> about
>> > after
>> > > the donation?
>> > >
>> > > Presumably the moved modules will be released by the new host - will
>> they
>> > > use group id org.apache.clerezza? or move to the new host project group
>> > id?
>> > > I'd suggest renaming the group to the new project but realise it is a
>> bit
>> > > more disruptive...
>> >
>> > I think that's really up to whatever project adopts that code. In
>> > theory package names should change but that's probably not convenient.
>> >
>> > Or maybe it's time to create a semantic module or two at
>> > http://commons.apache.org/ ? If existing committers are willing to
>> > support that with their work it should be easy to make it happen.
>> >
>> > -Bertrand
>> >
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstra├če 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message