Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2D149200CD8 for ; Wed, 19 Jul 2017 00:15:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2BB21165E87; Tue, 18 Jul 2017 22:15:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 728B1165E4F for ; Wed, 19 Jul 2017 00:15:07 +0200 (CEST) Received: (qmail 11774 invoked by uid 500); 18 Jul 2017 22:15:06 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 11763 invoked by uid 99); 18 Jul 2017 22:15:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jul 2017 22:15:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0CCD41A0AF7 for ; Tue, 18 Jul 2017 22:15:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id V7YsLgl0ONRV for ; Tue, 18 Jul 2017 22:15:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id D174C5FCB0 for ; Tue, 18 Jul 2017 22:15:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 022D1E0041 for ; Tue, 18 Jul 2017 22:15:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A646121EA3 for ; Tue, 18 Jul 2017 22:15:00 +0000 (UTC) Date: Tue, 18 Jul 2017 22:15:00 +0000 (UTC) From: "Enis Soztutar (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18086) Create native client which creates load on selected cluster MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 18 Jul 2017 22:15:08 -0000 [ https://issues.apache.org/jira/browse/HBASE-18086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092265#comment-16092265 ] Enis Soztutar commented on HBASE-18086: --------------------------------------- bq. Updated patch v12 where random number generation is lifted outside the loop (it was observed that write performance suffered with random number generation inside the loop). It does not make sense to me that random number generation is costly. I've looked at the folly code, there is nothing explaining it. Can you please verify the total number of columns written in each case. You can also test with just generating 1M or so random numbers in a loop and measure the total time it takes end to end. We want each row to come with a different number of columns. - No use of {{new}} or {{delete}}. Always use smart pointers. {code} + std::thread *writer_threads = new std::thread[FLAGS_threads]; {code} - These flags should have the same names as the ones in simple-client.cc: {code} +DEFINE_int32(multi_get_size, 1, "number of gets in one multi-get"); +DEFINE_bool(skip_get, false, "skip get / scan"); +DEFINE_bool(skip_put, false, "skip put's"); {code} there is also report_num_rows, scans and multigets and conf flags that you should implement. - These should be return values instead of passing pointer to the methods: {code} bool *succeeded {code} - Instead of executing every Cell as a different Put via Table::Put(), you should construct one Put object, add all the Cells, then call Table::Put() {code} for (uint64_t j = 0; j < rows; j++) { + std::string row = PrefixZero(width, iteration * rows + j); + for (auto family : families) { + table->Put(Put{row}.AddColumn(family, kNumColumn, std::to_string(n_cols))); + for (unsigned int k = 1; k <= n_cols; k++) { + table->Put(Put{row}.AddColumn(family, std::to_string(k), row)); + } + } {code} - Instead of this method: {code} +std::string PrefixZero(int total_width, int num) { {code} you can probably do something like this (from scanner-test.cc): {code} std::string Row(uint32_t i, int width) { std::ostringstream s; s.fill('0'); s.width(width); s << i; return "row" + s.str(); } {code} - Scans and gets should validate the obtained Result using the same logic, no? I think you should extract that into a function and use it from both. - The way we do multi-gets will result in all of the multi-get requests go to the same region. Instead, I think it is better to have the multi-gets scattered around most of the regions, so that we have a high likelihood of testing server failure handling, etc when chaos monkey is run with this. I had argued the same in my above comments. I think we can do something like a hash-like striping across the row key space among threads, rather than range-based striping. That should give us the ability to do multi-gets across all the regions in one {{Table::Get(std::vector)}} call. - We don't have multi-put functionality right now, but when that is added, we should do a follow up patch for this to add multi-put functionality. - These should default to {{load_test_table}} and {{f}} respectively. {code} +DEFINE_string(table, "t", "What table to do the reads and writes with"); +DEFINE_string(families, "d", "comma separated list of column family names"); {code} > Create native client which creates load on selected cluster > ----------------------------------------------------------- > > Key: HBASE-18086 > URL: https://issues.apache.org/jira/browse/HBASE-18086 > Project: HBase > Issue Type: Sub-task > Reporter: Ted Yu > Assignee: Ted Yu > Attachments: 18086.v11.txt, 18086.v12.txt, 18086.v14.txt, 18086.v1.txt, 18086.v3.txt, 18086.v4.txt, 18086.v5.txt, 18086.v6.txt, 18086.v7.txt, 18086.v8.txt > > > This task is to create a client which uses multiple threads to conduct Puts followed by Gets against selected cluster. > Default is to run the tool against local cluster. > This would give us some idea on the characteristics of native client in terms of handling high load. -- This message was sent by Atlassian JIRA (v6.4.14#64029)