Return-Path: Delivered-To: apmail-avro-user-archive@www.apache.org Received: (qmail 12224 invoked from network); 12 Feb 2011 03:11:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2011 03:11:18 -0000 Received: (qmail 73953 invoked by uid 500); 12 Feb 2011 03:11:17 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 73701 invoked by uid 500); 12 Feb 2011 03:11:15 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 73693 invoked by uid 99); 12 Feb 2011 03:11:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 03:11:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [64.78.17.16] (HELO EXHUB018-1.exch018.msoutlookonline.net) (64.78.17.16) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 03:11:08 +0000 Received: from EXVMBX018-1.exch018.msoutlookonline.net ([64.78.17.47]) by EXHUB018-1.exch018.msoutlookonline.net ([64.78.17.16]) with mapi; Fri, 11 Feb 2011 19:10:47 -0800 From: Scott Carey To: "user@avro.apache.org" Date: Fri, 11 Feb 2011 19:13:09 -0800 Subject: Re: storing avro messages in hbase Thread-Topic: storing avro messages in hbase Thread-Index: AcvKYnGXCnpV515JSOiytlBam/B32Q== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.0.101115 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C97B365723402scottrichrelevancecom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_C97B365723402scottrichrelevancecom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Storing hashes or pointers to schemas or schema hashes is the typical way = to deal with this. http://www.quora.com/What-is-the-best-way-to-work-with-Avro-serialized-data= -structures-in-a-database http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity= -store-on-top-of-hbase-and-solr/ Search-hadoop.com finds previous discussions on this topic: http://search-hadoop.com/m/3iG061GVhHd2/HAvroBase&subj=3DRe+Versioning+of+a= n+array+of+a+record http://search-hadoop.com/m/ZajsGoopYw/HAvroBase&subj=3DRe+question+about+co= mpletely+untagged+data+ http://search-hadoop.com/m/pz55F1beCEu1/HAvroBase&subj=3DRe+Setting+bytes+i= n+Java In Hbase you can also play tricks with column names to match up schemas wit= h versions =97 append or prepend a version number to the column name and qu= ery with a pattern match on the column. You might need 0.92 and its coproc= essors to use different deserializations per record returned however. On 2/11/11 6:32 PM, "Garrett Wu" > wrote: If I use avro to store messages into cells in HBase, would I need to store = the writer schema along with it in every cell? A problem that I foresee is that I might modify my schema and write new ver= sions to some of the cells in some rows of the table and then things would = blow up unless I had stored the writer schema in every cell. Is there a be= tter alternative? --_000_C97B365723402scottrichrelevancecom_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
Storing hashes or pointers &nbs= p;to schemas or schema hashes is the typical way to deal with this.








In Hbase you can also = play tricks with column names to match up schemas with versions =97 append = or prepend a version number to the column name and query with a pattern mat= ch on the column.  You might need 0.92 and its coprocessors to use dif= ferent deserializations per record returned however.



On 2/11/11 6:32 PM, "Garrett Wu"= <wugarrett@gmail.com> wro= te:

If I use avro to store messages into cel= ls in HBase, would I need to store the writer schema along with it in every= cell? 

A problem that I foresee is that I might mo= dify my schema and write new versions to some of the cells in some rows of = the table and then things would blow up unless I had stored the writer sche= ma in every cell.  Is there a better alternative?
--_000_C97B365723402scottrichrelevancecom_--