incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <>
Subject Re: JenaPerf and datasets...
Date Sat, 15 Oct 2011 17:46:20 GMT

Andy Seaborne wrote:
> On 11/10/11 08:12, Paolo Castagna wrote:
>> Hi Andy,
>> are you planning to put a few datasets in SVN together with the
>> queries in JenaPerf?
>> I saw a data directory for LUBM but not data in it:
>>  From a user perspective it would be great to just do:
>>    svn co
>> JenaPerf
>>    cd JenaPerf
>>    ./run
>> Installing any of LUBM, BSBM or SP2B (although not incredibly
>> complicate) isn't trivial.
> LUBM: The generator and test driver code is GPL.  The queries I have are
> taken from the published paper, translated by me to SPARQL so can they
> be distributed.  Data can be generated.

Could we generate some datasets using LUBM and make them available somewhere
in SVN to checkout together with JenaPerf so that users wanting to run LUBM
via JenaPerf do not need to generate data using LUBM themselves?

Or, if adding datasets to JenaPerf causes problem from a licensing point of
view, could we generate a few LUBM datasets and made them available to download
somewhere else and make JenaPerf downloading them when you need to run LUBM

> BSBM: The queries are actually templates and instantiated at runtime
> using a configuration file which is generated when the data is
> generated.  Generating data isn't just creating RDF triples.

Same as before, could we generate on behalf of the users a few BSBM datasets
and queries and made them available with JenaPerf?

> The queries templates exist in the code base (bsbmtools on SF).  I have
> been talking to the creators and the license has changed from GPL to AL
> (thanks guys).  


> So it will be possible to include queries from the
> codebase - the templating will have to be written.  (the license change
> affects JenaPerf becuase it is redistributing, unlike downloading and
> running).
> SP2B is published under BSD.


The reason why I'd like to add datasets in addition to the queries is to make
life easier for users. It would be good to just checkout/download JenaPerf and
run it without the need to install LUBM, BSBM and/or SP2B and generate datasets
using those.


>     Andy
>>  From a community and project perspective, it's quite good and helpful
>> to have a standard set of datasets. Although, I realize that if datasets
>> are not small, it might take a while to download them.
>> Can we use .gz datasets with JenaPerf?
>> We could also include small-medium size dataset together with JenaPerf
>> and have a separate checkout/download for larger ones.
>> What do you think?
>> Paolo

View raw message