incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <>
Subject Re: Is it possible to use BPlusTreeRewriter for Record(s) with values?
Date Wed, 21 Sep 2011 20:41:00 GMT
Paolo Castagna wrote:
> Hi,
> in the last days I made some experiments on different (hopefully more
> scalable,
> in particular on machines with RAM constraints) ways to generate TDB
> indexes.
> These improvements could be beneficial for tdbloader2 or a pure Java
> version
> of it (see: [1]). One specific thing, in particular, is necessary to
> complete
> tdbloader3 (i.e. a MapReduce implementation of a TDB loader).
> This email focuses on the node table only and more precisely on the B+Tree
> index of the node table. Such index has records with keys of 128 bit, which
> represent the hash of RDF node values, and values of 68 bit, which
> represent
> the corresponding node ids. This index is used to, given an RDF node,
> retrieve
> its node id. This is used to replace RDF node values before executing a
> query
> (since querie use indexes with node ids only in it).
> I'd like to be able to use the same technique used by tdbloader2 on the
> final
> stage for the SPO, POS, OSP, GSPO, GPOS, etc. B+Tree indexes to build the
> B+Tree index of the node table (see: [2]).
> I know how to generate and sort a file containing hash|id, see [3] for
> example.
> However, I don't think the current BPlusTreeRewriter can be used as it is
> to rebuild a B+Tree index from such a file. I think the main reason is
> because it uses createKeyOnly().
> Is that the only obstacle or it's much more complicate than that?
> Is it possible to change/adapt/extend BPlusTreeRewriter to support this use
> case as well?

Well, I was wrong: BPlusTreeRewriter works with Records with values as well.


This can helps JENA-117 (i.e. a pure Java version of tdbloader2).
More tests are necessary to establish if that would be faster than the current one.


> Thanks,
> Paolo
>  [1]
>  [2]
>  [3]

View raw message