Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B16DC181AA for ; Thu, 30 Jul 2015 17:25:05 +0000 (UTC) Received: (qmail 29629 invoked by uid 500); 30 Jul 2015 17:25:05 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 29577 invoked by uid 500); 30 Jul 2015 17:25:05 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 29567 invoked by uid 99); 30 Jul 2015 17:25:05 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 17:25:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0D339194E0D for ; Thu, 30 Jul 2015 17:25:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.099 X-Spam-Level: X-Spam-Status: No, score=-0.099 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 7aiFWZXuf8EC for ; Thu, 30 Jul 2015 17:24:52 +0000 (UTC) Received: from mail-qg0-f49.google.com (mail-qg0-f49.google.com [209.85.192.49]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 4100743DE9 for ; Thu, 30 Jul 2015 17:24:52 +0000 (UTC) Received: by qged69 with SMTP id d69so28922861qge.0 for ; Thu, 30 Jul 2015 10:24:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=eVeKkNdwU0YmZUnf/4JUmq1uNcE0MHyAEGx0fGCMNOE=; b=0ZnTr8MkRAmsfZ9Wq2rDxIFdw2grP5oHmxRqcuNlqGLvcozJBvCEPWCobRC6FEfFGw xgEy8riIsdLtnbnuomYaF7y4TiA16TqR5mW2+To8iJLEq4UfOqvm9h0SoocZST2A8GBJ X7xhxGd5lb0XxwGOpUuPlYfLq64R7RXqZhXx3Bh6fVBNx//bf3qTvZhwIzgH5J9GTaNJ lv8Uy72Ei8PS7KdLRcITBQRMQ2gKVb86EVk4UZtRluGd+bipWONxHWST8k5tsDVlAZbK LeSoRrjgwAWt7PchUt3UOUwOD+cDV9l8hxP6wgNWk8uI1F+9tB+GWNZHMz96XJL/aysG 3GVQ== X-Received: by 10.140.86.212 with SMTP id p78mr72492764qgd.49.1438277086768; Thu, 30 Jul 2015 10:24:46 -0700 (PDT) Received: from hw10447.local (pool-68-134-10-53.bltmmd.fios.verizon.net. [68.134.10.53]) by smtp.googlemail.com with ESMTPSA id m76sm770039qhb.48.2015.07.30.10.24.45 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 30 Jul 2015 10:24:45 -0700 (PDT) Message-ID: <55BA5DDB.1000300@gmail.com> Date: Thu, 30 Jul 2015 13:24:43 -0400 From: Josh Elser User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Entry-based TableBalancer References: <4dngtp202cqgunwk6tkjj1cd.1438218372168@email.android.com> <55B9A59D.7040209@orkash.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Konstantin Pelykh wrote: > Thanks for a suggestion, bellow are some details explaining the reason > for such balancer: > I'm basing my application on accumulo-wikipedia example, so there can be > multiple partitions per tablet. Some partitions are larger others are > smaller. Are you talking about the "sharded" table or the "inverted index" table? Assuming you mean the "sharded" table (given your mention of partitions), a skew here implies a poor choice of a partitioning algorithm. How are you choosing the partitions at ingest time? Hash-based? Something else? A good hash used to generate your partitions at ingest time should prevent such skew at query time. There's a possibility to split partition range manually afger > ingestion is complete and rely on default balancer to spread tablets > accross cluster, however in this case some servers end up overloaded > compared to others. > Currently the slowest server (hosting the largest tablet) defines final > time for search query, so I want to distribute entities accorss the > cluster so that they are well balanced and all servers spend simillir > amount of time processing documents though OptimizedQueryIterators. > > Konstantin > -------- > Big Data / Search Consultant > LinkedIn: linkedin.com/in/kpelykh > Website: www.kpelykh.com > > On Wed, Jul 29, 2015 at 9:18 PM, mohit.kaushik > wrote: > > If I am not getting you wrong, for this purpose, you can simply > pre-split tables based on range to evenly distribute data across > tablets. > https://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_tables > > > > > On 07/30/2015 07:46 AM, Konstantin Pelykh wrote: >> In this specific case, ingest happens only once. It's write-once, >> read-many type of application, so with such balancer I would want >> to balance tablets based on number of entities after ingest is >> fully complete. >> >> -------- >> Big Data / Search Consultant >> Cell: +1 (646) 639-3916 >> E-mail: kpelykh@gmail.com >> LinkedIn: linkedin.com/in/kpelykh >> Website: www.kpelykh.com >> >> On Wed, Jul 29, 2015 at 6:06 PM, dlmarion > > wrote: >> >> Hotspotting was the first thing that came to my mind with the >> proposed balancer. The fservers don't keep all the K/V in >> memory. You are balancing query and live ingest across your >> resources. >> >> >> >> >> >> -------- Original message -------- >> From: Eric Newton > > >> Date: 07/29/2015 8:46 PM (GMT-05:00) >> To: user@accumulo.apache.org >> Subject: Re: Entry-based TableBalancer >> >> To my knowledge, nobody has written such a balancer. >> >> In the history of the project, we started writing advanced, >> complicated balancers that moved tablets around much too >> quickly, which degraded performance. After that, we wrote much >> simpler balancers to avoid the chaos. We're moving back to >> more complex balancers, but mostly just to ensure that we >> aren't hotspoting, based on known ingest patterns (date >> related, for example). >> >> If you write a new balancer, make it slow to move tablets, and >> very simple. Avoid over-optimizing tablet placement. >> >> -Eric >> >> On Wed, Jul 29, 2015 at 8:20 PM, Konstantin Pelykh >> > wrote: >> >> Hi, >> >> I'm looking for a tablet balancer which operates based on >> a number of entries per tablet as opposed to a number of >> tablets per tablet server. My goal is to get even >> distribution of entries across the cluster. >> >> As an example: >> >> tablet #1 15M entries >> tablet #2 5M entries >> tablet #3 8M entries >> >> After balancing tablets I would want to get: >> >> Server 1 hosts: tablet1 >> Server 2 hosts: tablet2, tablet3 >> >> The idea is pretty simple and I believe such balancer has >> already been developed, so I decided to check before >> reinventing the wheel. >> >> Thanks! >> Konstantin >> >> -------- >> Big Data / Lucene and Solr Consultant >> LinkedIn: linkedin.com/in/kpelykh >> >> Website: www.kpelykh.com >> >> >> > > > -- > > *Mohit Kaushik* > Software Engineer > A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India > *Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553 > > interactive social intelligence at > work... > > > > > > ... ensuring Assurance in complexity and > uncertainty > > /This message including the attachments, if any, is a confidential > business communication. If you are not the intended recipient it may > be unlawful for you to read, copy, distribute, disclose or otherwise > use the information in this e-mail. If you have received it in error > or are not the intended recipient, please destroy it and notify the > sender immediately. Thank you / > >