Return-Path: X-Original-To: apmail-spark-reviews-archive@minotaur.apache.org Delivered-To: apmail-spark-reviews-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F0B9185FA for ; Mon, 18 Jan 2016 19:13:42 +0000 (UTC) Received: (qmail 23226 invoked by uid 500); 18 Jan 2016 19:13:42 -0000 Delivered-To: apmail-spark-reviews-archive@spark.apache.org Received: (qmail 23207 invoked by uid 500); 18 Jan 2016 19:13:42 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 22867 invoked by uid 99); 18 Jan 2016 19:13:42 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jan 2016 19:13:42 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id CC2E2E00DC; Mon, 18 Jan 2016 19:13:41 +0000 (UTC) From: nongli To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request: [SPARK-12879][SQL] improve the unsafe row writ... Content-Type: text/plain Message-Id: <20160118191341.CC2E2E00DC@git1-us-west.apache.org> Date: Mon, 18 Jan 2016 19:13:41 +0000 (UTC) Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10809#discussion_r50034304 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -26,36 +26,44 @@ import org.apache.spark.unsafe.types.UTF8String; /** - * A helper class to write data into global row buffer using `UnsafeRow` format, - * used by {@link org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection}. + * A helper class to write data into global row buffer using `UnsafeRow` format. + * + * It will remember the offset of row buffer which it starts to write, and move the cursor of row + * buffer while writing. If a new record comes, the cursor of row buffer will be reset, so we need + * to also call `reset` of this class before writing, to update the `startingOffset` and clear out + * null bits. Note that if we use it to write data into the result unsafe row, which means we will + * always write from the very beginning of the global row buffer, we don't need to update + * `startingOffset` and can just call `zeroOutNullBites` before writing new record. */ public class UnsafeRowWriter { - private BufferHolder holder; + private final BufferHolder holder; // The offset of the global buffer where we start to write this row. private int startingOffset; - private int nullBitsSize; - private UnsafeRow row; + private final int nullBitsSize; + private final int fixedSize; - public void initialize(BufferHolder holder, int numFields) { - this.holder = holder; + public void reset() { this.startingOffset = holder.cursor; - this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields); // grow the global buffer to make sure it has enough space to write fixed-length data. - final int fixedSize = nullBitsSize + 8 * numFields; - holder.grow(fixedSize, row); + holder.grow(fixedSize); holder.cursor += fixedSize; - // zero-out the null bits region + zeroOutNullBites(); + } + + public void zeroOutNullBites() { for (int i = 0; i < nullBitsSize; i += 8) { Platform.putLong(holder.buffer, startingOffset + i, 0L); } } - public void initialize(UnsafeRow row, BufferHolder holder, int numFields) { - initialize(holder, numFields); - this.row = row; + public UnsafeRowWriter(BufferHolder holder, int numFields) { --- End diff -- can you remove numFields. The holder contains a row and there is only one valid value for numFields. Better to pull it from holder.row --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org