incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Future of Clerezza and Stanbol
Date Sun, 11 Nov 2012 08:36:02 GMT
Hi Reto, all

ad (2) "Type-Based rendering":

You are right MessageBodyWriters might not easily work as responses for
different types will use the same Java class and Content-Type.

If it is not much work I suggest to activate both variants (LDVieable) and the
template based mechanism in the suggested branch to refactor the Stanbol
RESTful/Web stuff. Maybe a clever use of the OSGI Whiteboard Pattern
could allow to include the hooks required for template based framework without
creating a strong dependency to any implementation.


ad Scala:

> Do you know
> about user having a concrete issue with the additional ram requirement or
> is it more the fact that's not nice having memory used without clear reason
> that's bothering you?

It is mainly the 2nd. Stanbol runs on about 70MByte PermGen and as soon as
Scala starts this rises to ~200MByte.

However in case of the dev.iks-project.eu server where we had to run four
Stanbol instances on 9Gbyte RAM it was also a real concern related to memory.
In the meantime we where able to increase the RAM of the virtual host
to 15GByte.


ad Security:

I was not aware that the user/pwd was loaded from an RDF graph. I
was thinking that just the system properties used to init the system
where loaded
via Clerezza specific keys. My point was that the from the Stanbol Commons
Security module(s) to Clerezza seam to be very weak (in the sense that
they are only
using very few things of the referenced Clerezza modules). So my
question was if
we can/should remove them or if there are good reasons to keep them.


ad "Stanbol is no semantic CMS": you are right

> [..] the vision of an ecosystem of modular semantic and
> restful osgi components [..]

describes it much better. The important thing is that Stanbol can be used with
any CMS and does not require users to replace an existing CMS stack.

best
Rupert

On Sun, Nov 11, 2012 at 2:09 AM, Reto Bachmann-Gmür <reto@apache.org> wrote:

> (2) Type-Based rendering is [...]not something that can be implemented just by
> adding MessageBodyWriters as different RDF resources do not result in
> different java classes. For a framework providing resources as RDF typed
> based rendering seems the straight forward approach to allow these
> resources to be rendered in non rdf formats as well. For this we can still
> use Freemarker (with LDPath templates) but our legacy template that are
> require the class with the application logic to provide special hooks to
> the templates goes against the concept of having a plugable UI that can be
> left away for instances only to be used by machines. Keep in mind that an
> infrastructure for providing templates in a better way is already there
> since the introduction of LDVieable. Type Based rendering goes one step
> further as the jax-rs root resource would no longer have to provide the
> abstract template-path.
>
> (3)
> JSR-223 support: I suggested to drop this.
>
> Scala support: I'm wondering myself why there is such a big PermGenSpace
> need. I've just update clerezza trunk to use scala 2.9.2 this might have
> improved things a bit. As the compiler classloading mechanism is changed in
> 2.10 I guess a bigger improvement might come with that version. Do you know
> about user having a concrete issue with the additional ram requirement or
> is it more the fact that's not nice having memory used without clear reason
> that's bothering you?
>
> Shell: The felix webconsole is there to install bunde, configure services
> and so on. What you can do with the shell is actually invoking these
> service's methods and explore exported package structures. Especially when
> exploring API's I'm not yet familiar with the shell has been of great
> benefit to me. Of course it's a module one can turn off.
>
> Bundle-Dev-Tools: (These aren't yet in Stanbol.) Basically maven skeletons
> can also be used as prototypes for the bundle-dev-tool (just some maven
> magic needed). Of course it's question of style and size of the module if
> one want the dynamic update and things working independently of the pom
> dependencies or prefers to compile and redeploy. In the trunk version of
> dev-tools there's also instant update for static files which makes it
> particularly convenient when editing css and javascript. As long as no
> duplication of archetype/skeleton is needed I don't see why not offer both
> maven archetypes and skeletons.
>
> Security:
> You're suggesting one should configure the user, their password and
> permission in some config files rather than storing them in RDF and having
> a UI to edit them (Ok, I'm embarrassed that UI isn't there yet)? I think
> when we're talking about some launchers being stateless we mean that usage
> of  the (main) functionality it offers doesn't alter the state of the
> system. If you intepret "stateless" very strictly then you would have to
> drop most parts of the felix webconsole as http requests to install bundle
> or configure services aren't stateless. For the user-configuration a simple
> file-based TcProvider would of course be enough so no TDB is needed for
> that.
>
> I think we should see where we want to go as a community. For me the
> important thing is that Stanbol remains very modular. I think statements
> like "Stanbol is no semantic CMS" do not bring us further. It's important
> that the stanbol services can be used as services and that many services
> are stateless. But the contenthub is a component to manage content (the
> entityhub to some degree as well), do we want to mandate a horrible user
> interface just to comply with some catchphrase about what Stanbol is not?
> Or do we want to reduce Stanbol to the be just the Enhancer and let the
> other stuff to other projects?
>
> I'd rather go for the vision of an ecosystem of modular semantic and
> restful osgi components, but if the community wants to focus on the
> enhancer I think a clear statement should be made to avoid unnecessary
> arguments about memory consumption.
>
> Cheers,
> Reto
>
>
> On Fri, Nov 9, 2012 at 10:56 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi all,
>>
>> let me share my throughs. Because this mail is rather long I tried to
>> split it up in three separate section (1) RDF (2) RESTful/ Web
>> Interface and (3) other related topics
>>
>>
>> RDF libs:
>> ====
>>
>> Out of the viewpoint of Apache Stanbol one needs to ask the Question
>> if it makes sense to manage an own RDF API. I expect the Semantic Web
>> Standards to evolve quite a bit in the coming years and I do have
>> concern that the Clerezza RDF modules will be updated/extended to
>> provide implementations of those. One example of such an situation is
>> SPARQL 1.1 that is around for quite some time and is still not
>> supported by Clerezza. While I do like the small API, the flexibility
>> to use different TripleStores and that Clerezza comes with OSGI
>> support I think given the current situation we would need to discuss
>> all options and those do also include a switch to Apache Jena or
>> Sesame. Especially Sesame would be an attractive option as their RDF
>> Graph API [1] is very similar to what Clerezza uses. Apache Jena's
>> counterparts (Model [2] and Graph [3]) are considerable different and
>> more complex interfaces. In addition Jena will only change to
>> org.apache packages with the next major release so a switch before
>> that release would mean two incompatible API changes.
>>
>> My personal opinion is that we should keep using Clerezza for now.
>> Invest some effort to improve the Clerezza RDF modules and than see
>> how it further develops. Such an Effort should include
>>
>> *  to implement SPQRAL fast lane (as already discussed with Reto
>> during ApacheCon). Fast lane would allow Clerezza to use the native
>> SPARQL engine of the used Triplestore. Meaning that Clerezza only
>> parses those parts of the SPARQL query to understand the RDF graph to
>> execute the Query on. This information is than used to parse the query
>> to the native SPARQL engine via an extended Interface of the
>> TcProvide. The Clerezza SPARQL implementation would only be used in
>> case the TcProvider does not provide a native SPARQL implementation of
>> if the Query spans RDF graphs managed by different TcProvider
>> instances. By that Clerezza users would be able to use any SPARQL
>> feature provided by the used TripleStore.
>> * update to the newest Jena versions (see also STANBOL-621; Peter
>> Ansell's Clerezza fork on github [5] as well as Sebastian Schaffert's
>> Jena bundle used for the Stanbol/LMF integration [5])
>> * finish and release the SingleTdbDatasetTcProvider.java
>> (CLEREZZA-691) as this is important for the Stanbol Ontology Manager
>> component
>> * move the Indexed in-memory graph (CLEREZZA-683) from the Stanbol
>> code base to Clerezza and release it so that we can use it from their
>> in Stanbol
>> * provide an Clerezza JsonLD parser/serializer. This is critical for
>> Stanbol as several CMS use this as preferred RDF serialization.
>>
>> [1]
>> http://www.openrdf.org/doc/sesame2/api/org/openrdf/model/package-summary.html
>> [2]
>> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/rdf/model/Model.html
>> [3]
>> http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/graph/Graph.html
>> [4]
>> https://github.com/ansell/clerezza/commit/37747324d980fad6a33caa3da00491da66900c37
>> [5]
>> https://bitbucket.org/srfgkmt/stanbol-lmf/src/f41c6c93f08872469dc2e2d64fc06ad75f76f003/lmf-jena/pom.xml
>>
>>
>> RESTful API / Web Interface:
>> =====================
>>
>> There are several shortcomings of the current implementation of the
>> Stanbol RESTful services / Web UI modules ( o.a.stanbol.commons.web,
>> o.a.stanbol.*.web, o.a.stanbol.*.jersey modules)
>>
>> * Jersey's use of java.util.ServiceLoader forces the use manual
>> configuration of the JAX-RS components. A switch to an OSGI compatible
>> implementation such as Apache Wink would be very welcome
>> * The RESTful API documentation is currently written as HTML into
>> Freemarker templates. This makes it really hard to maintain this
>> documentation. I would really appreciate the possibility to use
>> markdown (as used on the Webpage) for that
>> * For Stanbol deployments of Stanbol it should be possible to exclude
>> the WebUI so that only the RESTful services are available
>>
>> regarding :
>>
>> > Stanbol drops it's interretation of "REST" as "not for humans" and want
>> to go to
>> > allow integrating (wherever possible as modular and optional components)
>> > media types designed for human consumptions and support REST approaches
>> > there as well (thinking of the current back-button unfriendly UI).
>>
>> Adding support for a simple Table based representation of RDF data
>> would indeed be an important feature. However having Resource (Entity)
>> type specific rendering is out of the scope of Apache Stanbol (at
>> least in my opinion). However AFAIK as soon as we switch to an OSGI
>> compatible JAX-RS implementation users could add those easily by
>> providing the according JAX-RS MessageBodyWriter.
>>
>> If there are people who would like to work it would be really great.
>> If we could (re)use some stuff from Clerezza - even better. But things
>> would need to keep simple as Stanbol is no semantic CMS.
>>
>> I would suggest to start development in an own branch and than have a
>> discussion/vote based on an early prototype/demonstration.
>>
>>
>> Other Topics
>> =========
>>
>> ### Scala and jsr 223 (scripting in the JVM)
>>
>> I do have an issue with Scala as it adds >150MByte to the PermGen as
>> soon as it is loaded. But as long as it is an optional dependency and
>> users are aware of that when adding the dependency I am fine with it.
>>
>> ###  Shell
>>
>> Personally I do not find the shell very useful. For installing
>> Bundles/Service configurations I prefer to use the Apache Sling
>> FileInstaller. For deployment during development I like to use the
>> Sling Maven Installer plugin. For creating new Stanbol Modules I
>> rather suggest to create an extensive list of Maven Archetype (e.g.
>> for Stanbol EnhancementEngines).
>>
>> As the Shell also depends on Scala the "+150MByte to the PermGen"
>> issue also applies to the Shell.
>>
>> ### Security
>>
>> Having a security model in Apache Stanbol might be important for some
>> use cases. Because of this I consider this an important topic. However
>> one I have very little experience with.
>>
>> I would like to get rid of the dependencies to
>> org.apache.clerezza:patform (AFAIK this is only needed for the
>> configuration and this could be easily provided by the
>> sling.properties file at runtime. Defaults can be provided in the
>> commons.properties file already included in all Stanbol Launchers. I
>> would also suggest to move the PermissionParser utility over to the
>> Apache Stanbol Security modules.
>> This two changes would allow to activate the security module also for
>> the Stable (Stateless) launcher.
>>
>>
>> best
>> Rupert
>>
>>
>> On Thu, Nov 8, 2012 at 2:39 PM, Hasan Hasan <hasan@trialox.org> wrote:
>> > Comments inline...
>> >
>> > On Thu, Nov 8, 2012 at 1:00 PM, Reto Bachmann-Gmür <reto@apache.org>
>> wrote:
>> >
>> >> Ok, sorry for jumping into this discussion so lately. I've been having
>> >> quite some discussion on the matter here at apacheconeu. Also I had
>> >> prositive feedback from my resentation of Clerezza yesterday.
>> >>
>> >> I think two things:
>> >> - For high level platform component it is often not clear if the fit
>> better
>> >> into Stanbol or into Clerezza
>> >> - The RDF Api shoud actually be independen both from triple store
>> provider
>> >> as well as from consumer
>> >>
>> >> So I think a good solution would be to have the RDF liraries comprising:
>> >> - A modular and very spec oriented API for RDF and related standards
>> >> - A set of serializing and parsing providers
>> >> - Adapters to triple stores (where the api isn't provided by the triple
>> >> store)
>> >> basically that's what in the org.apache.clerezza.rdf.* packages
>> >>
>> >> That's the stuff that would fit well into Stanbol. Provided that stanbol
>> >> drops it's interretation of "REST" as "not for humans" and want to go to
>> >> allow integrating (wherever possible as modular and optional components)
>> >> media types designed for human consumptions and support REST approaches
>> >> there as well (thinking of the current back-button unfriendly UI).
>> >>
>> >
>> > IMO, Clerezza is just too big for existing committers. If we could reduce
>> > it to the
>> > essential components dealing with rdf and leaving out templating and
>> > rendering,
>> > it may be easier to graduate.
>> >
>> > - Scala Server Pages
>> >> - TypeRendering (selection of templates based on the rdf type of the
>> >> returned response)
>> >> - Security (already integrated to some degree, code based security to
>> run
>> >> bundles in a sandboxed manner is not)
>> >> - Shell (already ships in the stanbol launcher, so here it's about
>> >> 'adopting' the sources)
>> >> - Dev tools: rapid development support (create sample projects, have
>> source
>> >> files as bundles)
>> >>
>> >> To the attic:
>> >> - Triaxrs: The Clerezza jax-rs implementation is no longer needed as the
>> >> same support (jax-rs components asosgi services) is now provided by
>> apache
>> >> wink
>> >> -  jssr 223 support
>> >>
>> >> In my opinion there is no urgent need for action, it is true that there
>> >> hasn't been a lot of action in clerezza but imho the project os going on
>> >> even at a low pace  (as other projects like e.g. the recently graduated
>> >> wink).
>> >>
>> >
>> > Not sure about no urgent need for action. Maybe we should list the
>> > requirements
>> > to fulfil in order to be able to graduate. Wonder if we are able to meet
>> > them.
>> >
>> > Cheers
>> > Hasan
>> >
>> >
>> >>
>> >> Cheers,
>> >> Reto
>> >>
>> >> On Thu, Nov 8, 2012 at 12:02 PM, Bertrand Delacretaz <
>> >> bdelacretaz@apache.org
>> >> > wrote:
>> >>
>> >> > On Thu, Nov 8, 2012 at 11:33 AM, Andy Seaborne <andy@apache.org>
>> wrote:
>> >> > > ...It's good to have the existing released artifacts remain -
what
>> >> about
>> >> > after
>> >> > > the donation?
>> >> > >
>> >> > > Presumably the moved modules will be released by the new host
- will
>> >> they
>> >> > > use group id org.apache.clerezza? or move to the new host project
>> group
>> >> > id?
>> >> > > I'd suggest renaming the group to the new project but realise
it is
>> a
>> >> bit
>> >> > > more disruptive...
>> >> >
>> >> > I think that's really up to whatever project adopts that code. In
>> >> > theory package names should change but that's probably not convenient.
>> >> >
>> >> > Or maybe it's time to create a semantic module or two at
>> >> > http://commons.apache.org/ ? If existing committers are willing to
>> >> > support that with their work it should be easy to make it happen.
>> >> >
>> >> > -Bertrand
>> >> >
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message