Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8248910D58 for ; Sun, 11 Aug 2013 05:22:08 +0000 (UTC) Received: (qmail 83607 invoked by uid 500); 11 Aug 2013 05:22:03 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 83367 invoked by uid 500); 11 Aug 2013 05:21:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 83095 invoked by uid 99); 11 Aug 2013 05:21:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Aug 2013 05:21:52 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [216.109.114.191] (HELO nm42-vm4.bullet.mail.bf1.yahoo.com) (216.109.114.191) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Aug 2013 05:21:43 +0000 Received: from [98.139.212.151] by nm42.bullet.mail.bf1.yahoo.com with NNFMP; 11 Aug 2013 05:21:22 -0000 Received: from [98.139.212.194] by tm8.bullet.mail.bf1.yahoo.com with NNFMP; 11 Aug 2013 05:21:22 -0000 Received: from [127.0.0.1] by omp1003.mail.bf1.yahoo.com with NNFMP; 11 Aug 2013 05:21:22 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 725718.25487.bm@omp1003.mail.bf1.yahoo.com Received: (qmail 56339 invoked by uid 60001); 11 Aug 2013 05:21:22 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1376198482; bh=pYiJCOX8acDXeqxgQ47FcwaQLsIqIbov1/o0CsX6M3A=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=KBw8MOwNUtad7TAQfozaiKtDN3a8DBG48L+BecavFjpgiti+YohZSTAIBTQr7eqEYbslsVZYU9yS42Go815QlpUU3uD5b2bGtoKyuhpy5VQqpX07kq9weh1kVT9NPbc6bdOFU4EcLp4ofojUS5wsZ/35yq4VcbWCNXwf7/J+W1A= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=ez8G4sNmtVMgN7fTV55f/QG/lf0wtx0+YDw0XzuvVnyZ7gFRvJu8BlgN2EurP6H6fPh4yrCzUSi28KCuzno6rTvexNvXlkDFpIT+QgQHwkZ10As+XHck082J6BmmIybO6zyKbeaiKm7s+UQ29VF5JNSfUQNunrv7OPRXYXlrAi8=; X-YMail-OSG: K23WeugVM1mI5pu9B88TowLZW5W79gaS7pTD.CON6MLIyWd JABsNrcifHuqGs7zkhQIhbvZ4WUY0ehDq6JYoluS0xG1vwd_SzQ_hB_BCQMR Q456RK7teu56P2m2TqDMf4Q_ea_r3PHbLB730lpjNZDLuQpQXEvD_RfDUTbQ bIX.jFbyq.f7fZhetR6crHsHdoinukdkovcajJ2uPxxNzteg4sDns2CB9npZ xtEhXmHKDQKiw_ZflurU.lBthphb.JUJq08GTs1vCm0poIw_snaibsudNJ12 5nVxbfumXGz1RZoecQv0_4Ms8PWbcjuCcbwjnu4DXWcOaEfqdlS7lWsRcSE3 gE_cL5muWdEEGjZNdQWsRz2JYKcCZn5Txk2m8aPAc6THXKqREuFeXeVtv9Z0 PWs5a435nO3Cj1wI2iIL8cr8clPnCoOCjhnivOahHVS7CtevGkH9T2qcaXu6 4oBC_7DOmitYuIH7R_7Aqj.MCH7MeEw9u22bgf_WN293Mz1CSeo5NKeB7SOq 2x1fF7xarF9fxS6w6L7EgPTufpzqC186IMiiMWjhI47u0bzhr6RdC8YF96xV 6eIx8nSDbVXfcCVx71x.DQdY1lXCIJpQb5sPN Received: from [24.4.148.188] by web140604.mail.bf1.yahoo.com via HTTP; Sat, 10 Aug 2013 22:21:22 PDT X-Rocket-MIMEInfo: 002.001,SWYgeW91IHdhbnQgZGVsZXRlcyB0byB3b3JrIGNvcnJlY3RseSB5b3Ugc2hvdWxkIGVuYWJsZSBLRUVQX0RFTEVURURfQ0VMTFMgZm9yIHlvdXIgY29sdW1uIGZhbWlsaWVzIChJIHN0aWxsIHRoaW5rIHRoYXQgc2hvdWxkIGJlIHRoZSBkZWZhdWx0IGFueXdheSkuCk90aGVyd2lzZSB0aW1lLXJhbmdlIHF1ZXJpZXMgd2lsbCBub3QgYmUgY29ycmVjdCB3LnIudC4gZGVsZXRlZCBkYXRhIChzcGVjaWZpY2FsbHkgeW91IGNhbm5vdCBnZXQgYmFjayBhdCBkZWxldGVkIGRhdGEgZXZlbiBpZiB5b3Ugc3BlY2lmeSABMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.153.572 References: <52063F87.10508@zfabrik.de> Message-ID: <1376198482.54586.YahooMailNeo@web140604.mail.bf1.yahoo.com> Date: Sat, 10 Aug 2013 22:21:22 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Using HBase timestamps as natural versioning To: "user@hbase.apache.org" In-Reply-To: <52063F87.10508@zfabrik.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org If you want deletes to work correctly you should enable KEEP_DELETED_CELLS for your column families (I still think that should be the default anyway). Otherwise time-range queries will not be correct w.r.t. deleted data (specifically you cannot get back at deleted data even if you specify a time range before the delete and even if you column family as unlimited versions). Depending on what your typical queries are, you might run into performance issues. HBase sorts all versions of a KeyValue adjacent to each other. If you now want to query only along the latest data (the last version), HBase will have to skip a lot of other versions. In the worst case the latest version of all KeyVales are on separate (HFile) blocks. The question of whether to use the builtin timestamps or model the time as part of the row keys (or even a time-column), is an interesting one. Generally the row-key identifies your row. If you want a new row for each TS in your logical model you should manage the time dimension yourself. Otherwise if you identities (i.e. row) with many versions, the builtin TS might be better. -- Lars ________________________________ From: Henning Blohm To: user Sent: Saturday, August 10, 2013 6:26 AM Subject: Using HBase timestamps as natural versioning Hi, we are managing some naturally time versioned data in HBase. That is, there are change events that have a specific time set and when such event is handled, data in HBase, pertaining to the exact same point in time, is updated. So far we are using HBase time stamps to model the time dimension. All columns have unlimited number of versions. That worked ok so far, and HBase's way of providing access to data at a given time or time range seemed a natural fit. We are aware of some tricky issues around timestamp handling (e.g. in particular in conjunction with deletes). As we need to migrate HBase stored data (for other reasons) shortly we are wondering, if our approach has some long-term drawbacks that we should pay attention to now and possibly re-design our timestamp handling as well. So my question is: * Is there problematic experience with using HBase timestamps as time dimension of your data (assuming it has some natural time-based versioning)? * Is it generally better to model time-based versioning of data within the data structure itself (e.g. in the row key) and why? * In case you used HBase timestamps similar to the way we use them, feedback on how that worked is welcome as well! Thanks, Henning