hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Healy <the...@bnl.gov>
Subject Re: Tricks to upgrading Sequence Files?
Date Wed, 30 Jan 2013 15:10:56 GMT
AVROs versioning capability might help if that could replace
SequenceFile in your workflow.

Just a thought.

-Terry

On 1/29/13 9:17 PM, David Parks wrote:
> I'll consider a patch to the SequenceFile, if we could manually override the
> sequence file input Key and Value that's read from the sequence file headers
> we'd have a clean solution.
>
> I don't like versioning my Model object because it's used by 10's of other
> classes and I don't want to risk less maintained classes continuing to use
> an old version.
>
> For the time being I just used 2 jobs. First I renamed the old Model Object
> to the original name, read it in, upgraded it, and wrote the new version
> with a different class name.
>
> Then I renamed the classes again so the new model object used the original
> name and read in the altered name and cloned it into the original name.
>
> All in all an hours work only, but having a cleaner process would be better.
> I'll add the request to JIRA at a minimum.
>
> Dave
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com] 
> Sent: Wednesday, January 30, 2013 2:32 AM
> To: <user@hadoop.apache.org>
> Subject: Re: Tricks to upgrading Sequence Files?
>
> This is a pretty interesting question, but unfortunately there isn't an
> inbuilt way in SequenceFiles itself to handle this. However, your key/value
> classes can be made to handle versioning perhaps - detecting if what they've
> read is of an older time and decoding it appropriately (while handling newer
> encoding separately, in the normal fashion).
> This would be much better than going down the classloader hack paths I
> think?
>
> On Tue, Jan 29, 2013 at 1:11 PM, David Parks <davidparks21@yahoo.com> wrote:
>> Anyone have any good tricks for upgrading a sequence file.
>>
>>
>>
>> We maintain a sequence file like a flat file DB and the primary object 
>> in there changed in recent development.
>>
>>
>>
>> It's trivial to write a job to read in the sequence file, update the 
>> object, and write it back out in the new format.
>>
>>
>>
>> But since sequence files read and write the key/value class I would 
>> either need to rename the model object with a version number, or 
>> change the header of each sequence file.
>>
>>
>>
>> Just wondering if there are any nice tricks to this.
>
>
> --
> Harsh J
>


Mime
View raw message