Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1DBD8176AB for ; Tue, 7 Apr 2015 21:22:41 +0000 (UTC) Received: (qmail 58823 invoked by uid 500); 7 Apr 2015 21:22:39 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 58754 invoked by uid 500); 7 Apr 2015 21:22:39 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 58742 invoked by uid 99); 7 Apr 2015 21:22:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2015 21:22:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stoffe@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Apr 2015 21:22:12 +0000 Received: by iedfl3 with SMTP id fl3so66250907ied.1 for ; Tue, 07 Apr 2015 14:22:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=i4Ij2Fx3HPu85kYbshMxE50TIJq+FExvxSw4gcLA3zs=; b=UvEAJTCXAw/qRfr0orcf52Y/ZjKZDakAKCEGSP35v2TqX1vvob4pSJMKAeZV4YPed4 xUisXaiuIOEQa7Ao9E6cxnFpLiPIj9bGm5BabHAWd+k3IqgpXoihDustBn9BKJe9jTV9 Zs7JxdtBFbedo6IrRspllQWU8M6fGiqFJhHeIT8Pggut0v7fttjQm07ylmUrSjm+XE48 KjS4hyD2hEStUbC5KjdwsNtUe1Xoe3+D0vqTkYlcBibYakfoGAFPHkeeChpi+epgN5LK u8FTB16btE7I1FtbljhcRI39wq2WTr6L/V0d2rjF8PkthRmM4QqtxNq2+yrGMz5x5eR2 OTZw== MIME-Version: 1.0 X-Received: by 10.50.6.4 with SMTP id w4mr7081258igw.36.1428441731188; Tue, 07 Apr 2015 14:22:11 -0700 (PDT) Received: by 10.107.152.131 with HTTP; Tue, 7 Apr 2015 14:22:11 -0700 (PDT) In-Reply-To: References: Date: Tue, 7 Apr 2015 23:22:11 +0200 Message-ID: Subject: Re: Rowkey design question From: =?UTF-8?Q?Kristoffer_Sj=C3=B6gren?= To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7bdc1686f84d5d051329011c X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc1686f84d5d051329011c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sorry I should have explained my use case a bit more. Yes, it's a pretty big row and it's "close" to worst case. Normally there would be fewer qualifiers and the largest qualifiers would be smaller. The reason why these rows gets big is because they stores aggregated data in indexed compressed form. This format allow for extremely fast queries (on local disk format) over billions of rows (not rows in HBase speak), when touching smaller areas of the data. If would store the data as regular HBase rows things would get very slow unless I had many many region servers= . The coprocessor is used for doing custom queries on the indexed data inside the region servers. These queries are not like a regular row scan, but very specific as to how the data is formatted withing each column qualifier. Yes, this is not possible if HBase loads the whole 500MB each time i want to perform this custom query on a row. Hence my question :-) On Tue, Apr 7, 2015 at 11:03 PM, Michael Segel wrote: > Sorry, but your initial problem statement doesn=E2=80=99t seem to parse = =E2=80=A6 > > Are you saying that you a single row with approximately 100,000 elements > where each element is roughly 1-5KB in size and in addition there are ~5 > elements which will be between one and five MB in size? > > And you then mention a coprocessor? > > Just looking at the numbers=E2=80=A6 100K * 5KB means that each row would= end up > being 500MB in size. > > That=E2=80=99s a pretty fat row. > > I would suggest rethinking your strategy. > > > On Apr 7, 2015, at 11:13 AM, Kristoffer Sj=C3=B6gren > wrote: > > > > Hi > > > > I have a row with around 100.000 qualifiers with mostly small values > around > > 1-5KB and maybe 5 largers ones around 1-5 MB. A coprocessor do random > > access of 1-10 qualifiers per row. > > > > I would like to understand how HBase loads the data into memory. Will t= he > > entire row be loaded or only the qualifiers I ask for (like pointer > access > > into a direct ByteBuffer) ? > > > > Cheers, > > -Kristoffer > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > > --047d7bdc1686f84d5d051329011c--