incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <>
Subject Is it possible to use BPlusTreeRewriter for Record(s) with values?
Date Wed, 21 Sep 2011 17:13:11 GMT
in the last days I made some experiments on different (hopefully more scalable,
in particular on machines with RAM constraints) ways to generate TDB indexes.
These improvements could be beneficial for tdbloader2 or a pure Java version
of it (see: [1]). One specific thing, in particular, is necessary to complete
tdbloader3 (i.e. a MapReduce implementation of a TDB loader).

This email focuses on the node table only and more precisely on the B+Tree
index of the node table. Such index has records with keys of 128 bit, which
represent the hash of RDF node values, and values of 68 bit, which represent
the corresponding node ids. This index is used to, given an RDF node, retrieve
its node id. This is used to replace RDF node values before executing a query
(since querie use indexes with node ids only in it).

I'd like to be able to use the same technique used by tdbloader2 on the final
stage for the SPO, POS, OSP, GSPO, GPOS, etc. B+Tree indexes to build the
B+Tree index of the node table (see: [2]).

I know how to generate and sort a file containing hash|id, see [3] for example.

However, I don't think the current BPlusTreeRewriter can be used as it is
to rebuild a B+Tree index from such a file. I think the main reason is
because it uses createKeyOnly().

Is that the only obstacle or it's much more complicate than that?

Is it possible to change/adapt/extend BPlusTreeRewriter to support this use
case as well?



View raw message