From users-return-19536-archive-asf-public=cust-asf.ponee.io@jena.apache.org  Thu Mar  7 13:05:42 2019
Return-Path: <users-return-19536-archive-asf-public=cust-asf.ponee.io@jena.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 2EE54180654
	for <archive-asf-public@cust-asf.ponee.io>; Thu,  7 Mar 2019 14:05:42 +0100 (CET)
Received: (qmail 26344 invoked by uid 500); 7 Mar 2019 13:05:36 -0000
Mailing-List: contact users-help@jena.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:users-help@jena.apache.org>
List-Unsubscribe: <mailto:users-unsubscribe@jena.apache.org>
List-Post: <mailto:users@jena.apache.org>
List-Id: <users.jena.apache.org>
Reply-To: users@jena.apache.org
Delivered-To: mailing list users@jena.apache.org
Received: (qmail 26333 invoked by uid 99); 7 Mar 2019 13:05:35 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2019 13:05:35 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1713A180E61
	for <users@jena.apache.org>; Thu,  7 Mar 2019 13:05:35 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 4.101
X-Spam-Level: ****
X-Spam-Status: No, score=4.101 tagged_above=-999 required=6.31
	tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001,
	URIBL_SBL=4, URIBL_SBL_A=0.1] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id 9AznHTAZguuL for <users@jena.apache.org>;
	Thu,  7 Mar 2019 13:05:33 +0000 (UTC)
Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6895160D0E
	for <users@jena.apache.org>; Thu,  7 Mar 2019 13:05:33 +0000 (UTC)
Received: by mail-wr1-f48.google.com with SMTP id f14so17340392wrg.1
        for <users@jena.apache.org>; Thu, 07 Mar 2019 05:05:33 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:subject:to:references:from:message-id:date
         :user-agent:mime-version:in-reply-to:content-language
         :content-transfer-encoding;
        bh=z6ws531o//up227tZrz+P24Pn+oppHzYBSD0tYI4M5o=;
        b=J4glrMC/egDmWut9UqTeJ7NpON+D+/6Y81ZnwG6t1m1rULyn0+4rKxFq4uvU//mAqN
         Lu+iI+NsKhp4D2CHsWkJ+fA0l0Fz5VirHHpWLzOlUxVa8cbjTbxZkJDf1BKWqtAk7lmc
         p1g9jqMIvY+t+xFQSaHXait2q3tR/zq7vgGFfbxtUyKkMhozHSBIvedui47evshhWk+w
         05eyYKjYDFYJKdZjRWCUVXU7SDK9vreTYcfOJhncGPBB1YiefsUmOHfA8gEbv4ArWabZ
         BffuBOk6xgKPlxiIPVJ3gnL68Kif9hwYpVqez8Muylojlt8zud0FLRAqO4n13y0o+Agv
         KW7A==
X-Gm-Message-State: APjAAAVbY/Drz4nETUIFr0C6BQR7UVkdbzeltd4sNIFno/4QiScJvFGP
	OXsowJg3dCX1xhg70Jkls0oWNwgxOgw=
X-Google-Smtp-Source: APXvYqxrcsjgFwU3MmH/EG7Ec4utOGMQePAWJSdYRdfOc94ouipEVqHIlijwdWCLRF9b/WTT+yOvNQ==
X-Received: by 2002:adf:90af:: with SMTP id i44mr6723453wri.222.1551963931893;
        Thu, 07 Mar 2019 05:05:31 -0800 (PST)
Received: from [192.168.1.13] (cpc85428-aztw29-2-0-cust363.18-1.cable.virginm.net. [82.38.145.108])
        by smtp.googlemail.com with ESMTPSA id l130sm5837549wmf.13.2019.03.07.05.05.29
        for <users@jena.apache.org>
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 07 Mar 2019 05:05:29 -0800 (PST)
Subject: Re: Storing a lot of strings in TDB store
To: users@jena.apache.org
References: <CACqw+_1wx4W9kMXp68RYrPo69_0AQTi9T6Sg3E1hw4LaoYPbhQ@mail.gmail.com>
 <6B4F408B-817F-4204-B433-5340DD563506@dotnetrdf.org>
 <CACqw+_1+3wwnLkzU0XECzrBTpZTtzruCygFTGZuq1ARBsFwpEg@mail.gmail.com>
 <A2268F35-72FB-40E6-BB6E-C2D9A47024B2@dotnetrdf.org>
 <CAOo9A67T7JF9jyd-dpKmC4UheqNWpTSAKcnkWex7GOfi9duMwg@mail.gmail.com>
From: Andy Seaborne <andy@apache.org>
Message-ID: <7fce9c1d-e0fa-b148-f19a-2c6e6efb5006@apache.org>
Date: Thu, 7 Mar 2019 13:05:29 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.5.1
MIME-Version: 1.0
In-Reply-To: <CAOo9A67T7JF9jyd-dpKmC4UheqNWpTSAKcnkWex7GOfi9duMwg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: 8bit

At the level of that description, they are much the same.

TDB2 differs in actual inline encoding of literals (it keeps the datatype).

TDB2 B+Trees are "copy on-write" (MVCC) and TDB2 has a different 
transaction mechanism resulting in arbitrary large transaction changes 
being supported.

TDB2 bulkloader is much faster (although it could be backported to TDB1; 
it is not fundamental to the TDB2 disk layout).

     Andy

On 06/03/2019 12:38, Siddhesh Rane wrote:
> It's for TDB 1 right? Is there a document for TDB 2? I couldn't find one
> 
> Regards
> Siddhesh
> 
> 
> On Fri, 22 Feb 2019, 8:48 pm Rob Vesse, <rvesse@dotnetrdf.org> wrote:
> 
>> It's here - http://jena.apache.org/documentation/tdb/architecture.html
>>
>> Rob
>>
>> ﻿On 22/02/2019, 04:03, "Ekaterina Danilova" <katja.danilova94@gmail.com>
>> wrote:
>>
>>      Thank you, it was exactly what I needed. It is still nice to hear what
>>      others think about my idea of data storage as resources and I think I
>> will
>>      stick to that option, but TDB storage logic was quite unclear to me.
>> Would
>>      be great if it was mentioned in official documentation since I couldn't
>>      find it.
>>      Thanks again for your help
>>
>>      On Tue, 19 Feb 2019 at 20:40, Rob Vesse <rvesse@dotnetrdf.org> wrote:
>>
>>      > Since I don't think anyone answered your specific original question
>>      >
>>      > TDB and TDB2 both use dictionary encoding (and in fact most RDF
>> stores use
>>      > some variation on this).  Basically they map each unique RDF term
>> (whether
>>      > URI, string, blank node etc) to a consistent internal identifier and
>> use
>>      > this to refer to the term.  Therefore most data structures
>> internally are
>>      > implemented in terms of these internal identifiers (which are
>> typically
>>      > very compact, TDB/TDB2 use 64 bit identifiers) and the system only
>>      > translates between the internal identifier and the full RDF term when
>>      > explicitly needed e.g. when presenting results
>>      >
>>      > Rob
>>      >
>>      > ﻿On 15/02/2019, 06:03, "Ekaterina Danilova" <
>> katja.danilova94@gmail.com>
>>      > wrote:
>>      >
>>      >     i would like to ask how TDB2 and Fuseki manages big amounts of
>> string
>>      > data
>>      >     (especially repeating data) and what it the best practices. Does
>> it
>>      >     optimize it somehow?
>>      >
>>      >
>>      >
>>      >
>>      >
>>
>>
>>
>>
>>
>>
>