impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Knupp (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2013: Issue Hbase queries individually during data-load.
Date Fri, 22 Jul 2016 22:54:14 GMT
David Knupp has uploaded a new patch set (#6).

Change subject: IMPALA-2013: Issue Hbase queries individually during data-load.
......................................................................

IMPALA-2013: Issue Hbase queries individually during data-load.

Loading data into HBase has traditionally been a bit flaky, with
problems being hard to diagnose from existing logs. I think this is
at least in part due to the fact that we have been relying on a
command file to send queries to the HBase shell. When sending a
series of queries in a file, the HBase shell does not check or
halt operation after each query.

From
https://hbase.apache.org/book.html#_read_hbase_shell_commands_from_a_command_file

"There is no way to programmatically check each individual command for
success or failure. Also, though you see the output for each command,
the commands themselves are not echoed to the screen so it can be
difficult to line up the command with its output."

Even if the HBase process dies completely, our data load process
goes through the laborious process of continuing to send commands
to the shell.

Instead of trying to process the file all at once, the command file
generated by generate-schema-statements.py should be iterated
line-by-line, with each query being passed individually to the HBase
shell, checking for errors in the output each time. If we get an
error message, fail fast and loudly.

Also, this commit fixes several flake8 linter complaints, and replaces
print statements with specific log level output.

Change-Id: I911d972ba8ad3a2a084c8195074556153722c7e2
---
M bin/load-data.py
1 file changed, 137 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/28/3728/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I911d972ba8ad3a2a084c8195074556153722c7e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Harrison Sheinblatt <hs7@hotmail.com>
Gerrit-Reviewer: Ishaan Joshi <ishaan@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>

Mime
View raw message