hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18086) Create native client which creates load on selected cluster
Date Tue, 18 Jul 2017 22:15:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092265#comment-16092265

Enis Soztutar commented on HBASE-18086:

bq. Updated patch v12 where random number generation is lifted outside the loop (it was observed
that write performance suffered with random number generation inside the loop).
It does not make sense to me that random number generation is costly. I've looked at the folly
code, there is nothing explaining it. Can you please verify the total number of columns written
in each case. You can also test with just generating 1M or so random numbers in a loop and
measure the total time it takes end to end. We want each row to come with a different number
of columns. 

- No use of {{new}} or {{delete}}. Always use smart pointers. 
+    std::thread *writer_threads = new std::thread[FLAGS_threads];

- These flags should have the same names as the ones in simple-client.cc: 
+DEFINE_int32(multi_get_size, 1, "number of gets in one multi-get");
+DEFINE_bool(skip_get, false, "skip get / scan");
+DEFINE_bool(skip_put, false, "skip put's");
there is also report_num_rows, scans and multigets and conf flags that you should implement.

- These should be return values instead of passing pointer to the methods: 
bool *succeeded

- Instead of executing every Cell as a different Put via Table::Put(), you should construct
one Put object, add all the Cells, then call Table::Put() 
for (uint64_t j = 0; j < rows; j++) {
+    std::string row = PrefixZero(width, iteration * rows + j);
+    for (auto family : families) {
+      table->Put(Put{row}.AddColumn(family, kNumColumn, std::to_string(n_cols)));
+      for (unsigned int k = 1; k <= n_cols; k++) {
+        table->Put(Put{row}.AddColumn(family, std::to_string(k), row));
+      }
+    }

- Instead of this method: 
+std::string PrefixZero(int total_width, int num) {
you can probably do something like this (from scanner-test.cc): 
std::string Row(uint32_t i, int width) {
  std::ostringstream s;
  s << i;
  return "row" + s.str();

- Scans and gets should validate the obtained Result using the same logic, no? I think you
should extract that into a function and use it from both. 
- The way we do multi-gets will result in all of the multi-get requests go to the same region.
Instead, I think it is better to have the multi-gets scattered around most of the regions,
so that we have a high likelihood of testing server failure handling, etc when chaos monkey
is run with this. I had argued the same in my above comments. I think we can do something
like a hash-like striping across the row key space among threads, rather than range-based
striping. That should give us the ability to do multi-gets across all the regions in one {{Table::Get(std::vector)}}
 - We don't have multi-put functionality right now, but when that is added, we should do a
follow up patch for this to add multi-put functionality. 
- These should default to {{load_test_table}} and {{f}} respectively. 
+DEFINE_string(table, "t", "What table to do the reads and writes with");
+DEFINE_string(families, "d", "comma separated list of column family names");

> Create native client which creates load on selected cluster
> -----------------------------------------------------------
>                 Key: HBASE-18086
>                 URL: https://issues.apache.org/jira/browse/HBASE-18086
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 18086.v11.txt, 18086.v12.txt, 18086.v14.txt, 18086.v1.txt, 18086.v3.txt,
18086.v4.txt, 18086.v5.txt, 18086.v6.txt, 18086.v7.txt, 18086.v8.txt
> This task is to create a client which uses multiple threads to conduct Puts followed
by Gets against selected cluster.
> Default is to run the tool against local cluster.
> This would give us some idea on the characteristics of native client in terms of handling
high load.

This message was sent by Atlassian JIRA

View raw message