Return-Path: Delivered-To: apmail-hive-dev-archive@www.apache.org Received: (qmail 48949 invoked from network); 18 Mar 2011 18:17:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Mar 2011 18:17:39 -0000 Received: (qmail 79403 invoked by uid 500); 18 Mar 2011 18:17:39 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 79380 invoked by uid 500); 18 Mar 2011 18:17:39 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 79372 invoked by uid 99); 18 Mar 2011 18:17:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 18:17:39 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of heyongqiangict@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qy0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 18:17:34 +0000 Received: by qyk2 with SMTP id 2so1120780qyk.14 for ; Fri, 18 Mar 2011 11:17:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=S7XSK/s0U43ShHT+oqUYOkicNMG5j6X5L1OpQtdUt3c=; b=DiBiLAbISjtBLnUWXng3JWqyFUzheukKedJhuMUNIDEZYs7koBgfRPqvWljMFks2t4 Y6OyV7X/HXWB837TKsqkCpeHRUCX/shJgk4pIt09rhtr44jkQLbFV7L7vjKq0RaIVnl4 7IP2SXUX/7n3ykezWzPI6FeV52gYHdVN+9ImA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=jEq4efqPBGZc5Lnabn5i49ViJ4qceFvIcB+rtT8djtHStAMbWuAwl8FbfhowD4sQE9 Z/SoQEu0iEg25EYu/tkC0altCJvR3qOERMsMT00suv0YDe54bDLBjMgSzhpHYylSvfIq iUA3D0TFOXbB1dH4lws9N0EhNQW0b8n3OZSM8= MIME-Version: 1.0 Received: by 10.229.51.214 with SMTP id e22mr1236847qcg.156.1300472233527; Fri, 18 Mar 2011 11:17:13 -0700 (PDT) Received: by 10.229.211.133 with HTTP; Fri, 18 Mar 2011 11:17:13 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Mar 2011 11:17:13 -0700 Message-ID: Subject: Re: RCFile - some queries From: yongqiang he To: dev@hive.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable >> but the recordLength is not the actual on-disk length of the record. It is acutal on-disk length. It is compressed key length plus the compressed value length >>Similarly, the next field - key length - is not the on-disk length of the= compressed key. There are two keyLengths, one is compressed key length, the other is uncompressed keyLength For 2, it wo't be a problem. record length is compressed length >>Thread-Safety. It is not thread safe. Application should do it themselves. It is initially designed for Hive. Thread safety is there at first time, and then removed because Hive does not need that, and 'synchronized' may need extra overhead >>3.1 Reader.nextBlock() is later added for file merge. So the normal reader should not use this method. >>3.2. True. On Fri, Mar 18, 2011 at 8:30 AM, Krishna Kumar wro= te: > Hello, > > =A0 =A0I was looking into the RCFile format, esp when used with compressi= on; a > picture of the file layout as I understand it in this case is attached. > > =A0 =A0Some queries/potential issues: > > =A0 =A01. RCFile makes a claim of being sequence file compatible; but the > recordLength is not the actual on-disk length of the record. As shown in = the > picture, it is the uncompressed key length plus the compressed value leng= th. > Similarly, the next field - key length - is not the on-disk length of the > compressed key. > > =A0 =A02. Record Length is also used for seeking on the inputstream. See > Reader.seekToNextKeyBuffer(). Since record length is overstated for > compressed records, this can result in incorrect positioning. > > =A0 =A03. Thread-Safety: Is the RCFile.Reader class meant to be thread-sa= fe? > Some public methods are marked synchronized which gives that appearance b= ut > there are a few thread-safety issues I think. > > =A0 =A0 =A0 =A03.1 Other public methods, such as Reader.nextBlock() are n= ot > synchronized which operate on the same data structures. > > =A0 =A0 =A0 =A03.2. Callbacks such as LazyDecompressionCallbackImpl.decom= press > operates on the valuebuffer currentValue, which can be simultaneously > modified by the public methods on the Reader. > > Cheers, > =A0Krishna > >