incubator-clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Spicar <dspi...@apache.org>
Subject Re: clerezza.rdf.jena.tdb.storage filling up with ontonet files
Date Mon, 19 Mar 2012 09:45:01 GMT
Hi Rupert,

I ran into a similar problem when I worked on a Jena SDB storage provider
(not have to create separate databases for each Clerezza graph). Back then
I didn't create a proper solution so I am interested in your approach. From
what you described it sounds good to me.

There are a couple of things to keep in mind. I think they are both handled
on a higher layer and should work transparently but it's good to keep it in
mind.
1. Graph permissions need to work. I think they work via the graph
URI/name, so they may be handled transparently.
2. Make sure rdf.storage.externalizer works with your solution.

Best,
Daniel

On 19 March 2012 09:16, Hasan Hasan <hasan@trialox.org> wrote:

> Hi all,
>
> I generally agree to extend Clerezza to be able to support multiple
> requirements. Thus, I see the necessity of SingleDatasetTdbTcProvide.
> Although I am bit unhappy, due to the fact, that application developers
> have to be aware of this.
> Note that, new clerezza instances (at least my own build) do not anymore
> generate 200 MB of index files for empty graphs, but merely 200K.
>
> Regards
> Hasan
>
>
> On Fri, Mar 16, 2012 at 2:10 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
> > Hi David, stanbol & clerezza community
> >
> > Short summary of the situation:
> >
> > The Ontonet component generate a lot of MGraphs using the Jena TDB
> > provider. This causes the disc consumption and number of open files to
> > explode. See the quoted emails for details
> >
> >
> > @Stanbol  we are already discussion how to avoid the creation of such
> many
> > graphs
> >
> >
> > @Clerezza the observed behavior of the TDB provider is also very
> dangerous
> > (at least for typical use cases in Apache Stanbol).
> >
> > Even targeting at a different CLEREZZA-467 maybe provides a possible
> > solution for that as it suggests to use named graphs instead of isolated
> > TDB instances for creating MGraphs.
> >
> > To be honest this would be the optimal solution for our usages of
> Clerezza
> > in Stanbol. However I assume that for a semantic CMS it is saver to use
> > different TDB datasets.
> >
> > Because of that I  would like to make the following proposal that
> > hopefully covers both the needs of Apache Stanbol and Apache Clerezza.
> >
> > 1. AbstractTdbTcProvider: providing most of the functionality needed to
> > store Clerezza MGraphs in Jena TDB
> >
> > 2. TdbTcProvider: The same as now but now extending the abstract one. I
> > follows the currently used methodology to map Clerezza graphs to separate
> > TDB datasets
> >
> > 3. SingleDatasetTdbTcProvider: Tdb provider variant that stores all
> > MGraphs in a single TDB dataset. This provider should also support
> > "configurationFactory=true" (multiple instances). each instance would
> use a
> > different TDB dataset to store its MGrpahs.
> >
> > By default the SingleDatasetTdbTcProvider would be inactive, because it
> > requires a configuration of the directory for the  TDB dataset as well
> as a
> > name (that can be used in Filters). This ensures full backward
> > compatibility.
> >
> > In environment - such as Stanbol - where you want to store multiple
> graphs
> > in the same TDB dataset you would need to provide a configuration for the
> > SingleDatasetTdbTcProvider. Here you have two possible usage scenarios:
> >
> > * if you just need a single TDB dataset that stores all MGraphs, than you
> > can assign a high enough service.ranking to the
> SingleDatasetTdbTcProvider
> > and normally use the TcManager to create your graphs.
> > * if you want to use single TDB datasets or a mix of the TdbTcProvider
> and
> > SingleDatasetTdbTcProvider's you will need to use according filters.
> >
> >
> > WDYT
> > Rupert
> >
> >
> > [1] https://issues.apache.org/jira/browse/CLEREZZA-467
> >
> > On 16.03.2012, at 10:44, Rupert Westenthaler wrote:
> >
> > > Hi David, all
> > >
> > > this could be the explanation for the failed build on the Jenkins
> server
> > when the SEO configuration for the Refactor engine was used in the
> default
> > configuration of the Full launcher
> > >
> > > see http://markmail.org/message/sprwklaobdjankig for details.
> > >
> > > For me that looks like as if the RefactorEngine does create multiple
> > Jena TDB instances for various created MGraphs. One needs to know the
> even
> > for an empty graph Jena TDB creates ~200MByte of index files. So it is
> > important to map multiple MGraphs to different named graphs of the same
> > Jena TDB store.
> > >
> > > I have no Idea how Clerezza manages this or how Ontonet creates
> MGraphs,
> > but I hope this can help in tracing this down.
> > >
> > > best
> > > Rupert
> > >
> > > On 16.03.2012, at 10:30, David Riccitelli wrote:
> > >
> > >> Dears,
> > >>
> > >> As I ran into disk issues, I found that this folder:
> > >> sling/felix/bundleXXX/data/tdb-data/mgraph
> > >>
> > >> where XX is the bundle of:
> > >> Clerezza - SCB Jena TDB Storage Provider
> > >> org.apache.clerezza.rdf.jena.tdb.storage
> > >>
> > >> took almost 70 gbytes of disk space (then the disk space has been
> > >> exhausted).
> > >>
> > >> These are some of the files I found inside:
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology889
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology1041
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology395
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology363
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology661
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology786
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology608
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology213
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology188
> > >> 193M ./ontonet%3A%3Ainputstream%3Aontology602
> > >>
> > >>
> > >> Any clues?
> > >>
> > >> Thanks,
> > >> David Riccitelli
> > >>
> > >>
> >
> ********************************************************************************
> > >> InsideOut10 s.r.l.
> > >> P.IVA: IT-11381771002
> > >> Fax: +39 0110708239
> > >> ---
> > >> LinkedIn: http://it.linkedin.com/in/riccitelli
> > >> Twitter: ziodave
> > >> ---
> > >> Layar Partner Network<
> >
> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
> > >
> > >>
> >
> ********************************************************************************
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message