jena-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <a...@apache.org>
Subject Re: Storing a lot of strings in TDB store
Date Thu, 07 Mar 2019 13:05:29 GMT
At the level of that description, they are much the same.

TDB2 differs in actual inline encoding of literals (it keeps the datatype).

TDB2 B+Trees are "copy on-write" (MVCC) and TDB2 has a different 
transaction mechanism resulting in arbitrary large transaction changes 
being supported.

TDB2 bulkloader is much faster (although it could be backported to TDB1; 
it is not fundamental to the TDB2 disk layout).

     Andy

On 06/03/2019 12:38, Siddhesh Rane wrote:
> It's for TDB 1 right? Is there a document for TDB 2? I couldn't find one
> 
> Regards
> Siddhesh
> 
> 
> On Fri, 22 Feb 2019, 8:48 pm Rob Vesse, <rvesse@dotnetrdf.org> wrote:
> 
>> It's here - http://jena.apache.org/documentation/tdb/architecture.html
>>
>> Rob
>>
>> On 22/02/2019, 04:03, "Ekaterina Danilova" <katja.danilova94@gmail.com>
>> wrote:
>>
>>      Thank you, it was exactly what I needed. It is still nice to hear what
>>      others think about my idea of data storage as resources and I think I
>> will
>>      stick to that option, but TDB storage logic was quite unclear to me.
>> Would
>>      be great if it was mentioned in official documentation since I couldn't
>>      find it.
>>      Thanks again for your help
>>
>>      On Tue, 19 Feb 2019 at 20:40, Rob Vesse <rvesse@dotnetrdf.org> wrote:
>>
>>      > Since I don't think anyone answered your specific original question
>>      >
>>      > TDB and TDB2 both use dictionary encoding (and in fact most RDF
>> stores use
>>      > some variation on this).  Basically they map each unique RDF term
>> (whether
>>      > URI, string, blank node etc) to a consistent internal identifier and
>> use
>>      > this to refer to the term.  Therefore most data structures
>> internally are
>>      > implemented in terms of these internal identifiers (which are
>> typically
>>      > very compact, TDB/TDB2 use 64 bit identifiers) and the system only
>>      > translates between the internal identifier and the full RDF term when
>>      > explicitly needed e.g. when presenting results
>>      >
>>      > Rob
>>      >
>>      > On 15/02/2019, 06:03, "Ekaterina Danilova" <
>> katja.danilova94@gmail.com>
>>      > wrote:
>>      >
>>      >     i would like to ask how TDB2 and Fuseki manages big amounts of
>> string
>>      > data
>>      >     (especially repeating data) and what it the best practices. Does
>> it
>>      >     optimize it somehow?
>>      >
>>      >
>>      >
>>      >
>>      >
>>
>>
>>
>>
>>
>>
> 

Mime
View raw message