Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 54780 invoked from network); 4 Sep 2008 18:46:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2008 18:46:58 -0000 Received: (qmail 96427 invoked by uid 500); 4 Sep 2008 18:46:54 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 96398 invoked by uid 500); 4 Sep 2008 18:46:54 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 96387 invoked by uid 99); 4 Sep 2008 18:46:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 11:46:53 -0700 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pwyckoff@facebook.com designates 204.15.23.140 as permitted sender) Received: from [204.15.23.140] (HELO mailout-sf2p.facebook.com) (204.15.23.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 18:45:56 +0000 Received: from mail.thefacebook.com (sc-hub01.thefacebook.com [192.168.18.104]) by pp01.sf2p.tfbnw.net (8.14.1/8.14.1) with ESMTP id m84IkInX008375 for ; Thu, 4 Sep 2008 11:46:18 -0700 Received: from SF2PMXB01.TheFacebook.com (192.168.16.15) by sc-hub01.TheFacebook.com (192.168.18.104) with Microsoft SMTP Server id 8.1.291.1; Thu, 4 Sep 2008 11:46:18 -0700 Received: from sf2pmxb02.TheFacebook.com ([192.168.16.17]) by SF2PMXB01.TheFacebook.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 4 Sep 2008 11:46:18 -0700 Received: from 192.168.18.252 ([192.168.18.252]) by sf2pmxb02.TheFacebook.com ([192.168.16.17]) via Exchange Front-End Server mail.thefacebook.com ([192.168.18.106]) with Microsoft Exchange Server HTTP-DAV ; Thu, 4 Sep 2008 18:46:18 +0000 User-Agent: Microsoft-Entourage/11.3.6.070618 Date: Thu, 4 Sep 2008 11:46:17 -0700 Subject: Re: Serialization with additional schema info From: Pete Wyckoff To: Message-ID: Thread-Topic: Serialization with additional schema info Thread-Index: AckOvoOGwjteg3qxEd2gNgAbY6IsQg== In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 04 Sep 2008 18:46:18.0299 (UTC) FILETIME=[844D18B0:01C90EBE] X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.7160:2.4.4,1.2.40,4.0.166 definitions=2008-09-04_06:2008-09-02,2008-09-04,2008-09-04 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=5.0.0-0805090000 definitions=main-0807290170 X-Virus-Checked: Checked by ClamAV on apache.org I'll just give another plug for Thrift's TRecordStream which has fixed sized frames that can be optionally compressed or checksummed; since the frames are fixed sized, it can be split on frame boundaries. You can write whatever data you want with it - it doesn't have to be thrift, it just takes whatever is written and writes it to a FD or a socket or whatever. There is the issue of spill over between frames just like the sequence file case. -- pete On 9/4/08 11:32 AM, "Ted Dunning" wrote: > On Thu, Sep 4, 2008 at 10:51 AM, Owen O'Malley wrote: > >> ... >> It is also not splittable. It would be really nice to have a codec that was >> similar in compression/cpu cost to gzip that was splittable. >> > > Indeed. > > What happened to the effort to build a splittable gzip codec by inserting > dummy compression resets with a known pattern? >