Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 03086200BB9 for ; Mon, 7 Nov 2016 13:04:21 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 01A4A160AF9; Mon, 7 Nov 2016 12:04:21 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 48181160AEB for ; Mon, 7 Nov 2016 13:04:20 +0100 (CET) Received: (qmail 61836 invoked by uid 500); 7 Nov 2016 12:04:19 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 61826 invoked by uid 99); 7 Nov 2016 12:04:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Nov 2016 12:04:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C7CA8C0295 for ; Mon, 7 Nov 2016 12:04:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.499 X-Spam-Level: X-Spam-Status: No, score=0.499 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=deenlo-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id zYKafiXoa4DL for ; Mon, 7 Nov 2016 12:04:17 +0000 (UTC) Received: from mail-qk0-f171.google.com (mail-qk0-f171.google.com [209.85.220.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 1F9BE5FD63 for ; Mon, 7 Nov 2016 12:04:17 +0000 (UTC) Received: by mail-qk0-f171.google.com with SMTP id n204so162282006qke.2 for ; Mon, 07 Nov 2016 04:04:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=deenlo-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=jBRMNQyItbJ6rmfWKHBEGoAsB3s+fGaKsy0F1LO1Er0=; b=S9Tc+eto+yCRXfwlUKU2GSw4rYwkFgv5A+NFq8GkUmMZBl9B6e9LTFREj4rXSADGAV qOoYhAZQ2oB4vA79Ng9vx8N7g6hKoQPflmmIv3SNVmFdC17fRFEm8XpqEEV4Eh9FBLpl mZMQOmnrmeU9Hx7krFer9E7+ake+UoUpSBUNHkzAlyHgosY6tZ3voFAZgyp37DdY/326 REulBoxe9Z6zaYAko6twwpaipHLTL1GEgV2saLke4oLCadv6m0V4JZJoFE2MyknZ3g4Z zXheXdiMyoC+r22r4ANwEXTc8/e6M2v6AnFnlpWbToQrZYoGprCFWGDc7IQTqmwhakQH 6Xkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=jBRMNQyItbJ6rmfWKHBEGoAsB3s+fGaKsy0F1LO1Er0=; b=Rd8kb+odNO2oa+MKo/S4hJmk+wOjY+xonzPQXDDKKqSLeV4FL6M6s6Yb0JYgimy7C8 chLAmDY8c4sBGXWWP15Rb8LhW2WjDrt271mGkdm68zT8MzMp72qkRlfBl63jW0NszNXq g9Ltm0a+frebKdXZCx6wyALVHjPaxdQs85H2lRrI4h+Y7mZZrBorTYcBwxHtrwm9Lnf6 Tp+7nf+lDHiW0w3p3hHd871QTrTv0N71MhH8c5kA5clZzwNaC3ulKRIPVQKT19bIlJFc Rqwei8+VZWoBKfAd0HCHureaNzNFN8UNhGs/aAQVf4wH2gQa1oPvVE+2Rwh9P+H1lowS pB9w== X-Gm-Message-State: ABUngvenChAEREip7fEiHYYMbR+CL7itP1PzrrIx6ma5G7A9uUZll4CrdTlBGgD2AUGWJwPlcAHYG6NVV6qW8A== X-Received: by 10.55.221.4 with SMTP id n4mr6942351qki.138.1478520250099; Mon, 07 Nov 2016 04:04:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.137.91 with HTTP; Mon, 7 Nov 2016 04:04:09 -0800 (PST) In-Reply-To: References: From: Keith Turner Date: Mon, 7 Nov 2016 07:04:09 -0500 Message-ID: Subject: Re: Comparing Schemes in Accumulo To: user@accumulo.apache.org Content-Type: text/plain; charset=UTF-8 archived-at: Mon, 07 Nov 2016 12:04:21 -0000 On Mon, Nov 7, 2016 at 6:54 AM, Oliver Swoboda wrote: > Hello, > > I've stored weather data in two tables with different schemes. Scheme1 is > using the month and station ID for the row key (e.g. 201601_GME00102292) and > the days of the month (1-31) in the version column. Scheme2 is using the > year and station ID for the row key (e.g. 2016_GME00102292) and the days of > the year (1-366) in the version column. Of course, the version iterator has > been removed from the tables. Because I have different metrics, like minimum > temperature and maximum temperature of one day, I'm using locality groups, > one group for each metric. (e.g. setgroups TMIN=TMIN, TMAX=TMAX). > Additionaly I've done a pre splitting by year (e.g. 2014, 2015, 2016, ...). > > Now to my question: If I do a full table scan with a batch scanner, Scheme2 > is always faster than Scheme1 (with 2.5 billion entries Scheme1's scan took > 24 minutes and Scheme2's scan took 21 minutes). Why is that? Is it because > there are fewer seeks made when using Scheme2? Would be nice if someone can > help me to understand what's happening here. One possible reason is the relative encoding used in Accumulo. When two consecutive keys have the same row, the second key will just point to the previous row. This makes row comparisons faster. Also when data is transferred over the network from server to client, repeated rows are not transferred. > > Yours faithfully, > Oliver Swoboda