impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Knupp (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2013: Issue Hbase queries individually during data-load.
Date Fri, 22 Jul 2016 19:39:51 GMT
David Knupp has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3728

Change subject: IMPALA-2013: Issue Hbase queries individually during data-load.
......................................................................

IMPALA-2013: Issue Hbase queries individually during data-load.

Loading data into HBase has traditionally been a bit flaky, with
problems being hard to diagnose from existing logs. I think this is
at least in part due to the fact that we have been relying on a
command file to send queries to the HBase shell. When sending a
series of queries in a file, the HBase shell does not check or
halt operation after each query.

From
https://hbase.apache.org/book.html#_read_hbase_shell_commands_from_a_command_file

"There is no way to programmatically check each individual command for
success or failure. Also, though you see the output for each command,
the commands themselves are not echoed to the screen so it can be
difficult to line up the command with its output."

Even if the HBase process dies completely, our data load process
goes through the laborious process of continuin to send commands
to the shell.

Instead, the command file generated by generate-schema-statements.py
should be iterated line-by-line, with each query being passed
individually to the HBase shell, checking for errors in the output
each time. If we get an error message, fail fast and loudly.

Also, fix several flake8 linter complaints, and replace print
statements with specific log level output.

Change-Id: I911d972ba8ad3a2a084c8195074556153722c7e2
---
M bin/load-data.py
1 file changed, 102 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/28/3728/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I911d972ba8ad3a2a084c8195074556153722c7e2
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: David Knupp <dknupp@cloudera.com>

Mime
View raw message