impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Internal Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3788: Add flag for Kudu read-your-writes
Date Wed, 07 Dec 2016 05:01:01 GMT
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3788: Add flag for Kudu read-your-writes
......................................................................


IMPALA-3788: Add flag for Kudu read-your-writes

The previous attempt to support for Kudu 'read-your-writes'
consistency successfully captured the latest observed ts
from the Kudu client after a write, and to propagate it to
future Kudu clients within the same session. That alone made
writes within a session linearizable, but it did not fully
address 'read-your-writes' semantics because the Kudu client
in the KuduScanner needed further configuration.

The Kudu client exposes an option to set the 'ReadMode',
which can be either READ_LATEST or READ_AT_SNAPSHOT. The
former is the default and allows the client to read the
latest known value for every row, and there is no
consistency among the version of the rows read within that
scan. When READ_AT_SNAPSHOT is enabled, the client will pick a
ts that is after the latest observed session ts (propagated
and set with SetLatestObservedTimestamp() by the previous
commit for IMPALA-3788) and perform a snapshot read at that
time. This timestamp is still determined per-client, so that
does not mean that the entire query performs a snapshot read
at the same timestamp-- doing that requires further work
in Kudu and will require another change in Impala as well.

That said, this behavior is sufficient to satisfy
'read-your-writes' consistency in all cases _except_ when a
DML statement is reading and writing the same table, e.g.
  INSERT INTO foo SELECT ... from foo
This case may result in reading rows that were inserted by a
different node of the same query. This case will be handled
when a global snapshot timestamp is supported and configured
by Impala.

Because this is performing a snapshot read, some rows may be
read from lagging replicas and thus those replicas will have
to wait before returning rows. This has implications for
the query execution behavior (e.g. queries may be more
likely to time out, may affect number of queries that can be
run), so the behavior is not yet enabled by default. It can
be enabled with the flag --kudu_read_mode READ_AT_SNAPSHOT
The goal is to make this the default behavior after
sufficient testing.

Change-Id: I003aba410548bc9158d1e11abbdcf710c31a82ff
Reviewed-on: http://gerrit.cloudera.org:8080/5288
Reviewed-by: Matthew Jacobs <mj@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exec/kudu-scanner.cc
1 file changed, 11 insertions(+), 0 deletions(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5288
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I003aba410548bc9158d1e11abbdcf710c31a82ff
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dralves@apache.org>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>

Mime
View raw message