Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C1DD10B5A for ; Sun, 27 Oct 2013 14:59:49 +0000 (UTC) Received: (qmail 33979 invoked by uid 500); 27 Oct 2013 14:59:41 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 33901 invoked by uid 500); 27 Oct 2013 14:59:32 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 33892 invoked by uid 99); 27 Oct 2013 14:59:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Oct 2013 14:59:29 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of java8964@hotmail.com designates 65.54.51.86 as permitted sender) Received: from [65.54.51.86] (HELO snt0-omc3-s49.snt0.hotmail.com) (65.54.51.86) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Oct 2013 14:59:22 +0000 Received: from SNT149-W7 ([65.55.90.137]) by snt0-omc3-s49.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sun, 27 Oct 2013 07:59:02 -0700 X-TMN: [CmI+Z7QxAVfb7JjqjG5e+GJ4MwPcghkF] X-Originating-Email: [java8964@hotmail.com] Message-ID: Content-Type: multipart/alternative; boundary="_0790e9e9-be1e-4cba-9c3e-a9cada581451_" From: java8964 java8964 To: "user@avro.apache.org" Subject: How the custom Key class can be used in Avro Date: Sun, 27 Oct 2013 10:59:01 -0400 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 27 Oct 2013 14:59:02.0150 (UTC) FILETIME=[12B62260:01CED325] X-Virus-Checked: Checked by ClamAV on apache.org --_0790e9e9-be1e-4cba-9c3e-a9cada581451_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=2C Currently I have a MR job needs to use my own Key class to support 2nd sort= in the MR job. The originally job is using Avro String type as the mapper output like this= format: public class MyMapper extends MapReduceBase implements Mapper=2C AvroValue> Right now=2C I need to change the key from Text to a custom Key object=2C a= s I need to control complex sorting order and support 2nd sort in my MR job= . So I create a CustomKeyObject (PartitionKey class)=2C which contains 3 Long= values and 4 String values. This key class implements WritableComparable a= nd I also have my KeyComparator and KeyGroupComparator class implementation= ready. So in this case=2C I want to change my mapper for the new format: public class MyMapper extends MapReduceBase implements Mapper=2C AvroValue> Here comes the problem=2C I don't know what kind of schema I can use in my = driver class for this key. Originally=2C the driver will have following line: AvroJob.setMapOutputSchema(conf=2C Pair.getPairSchema(Schema.create(Schema.= Type.STRING)=2C OneAvroSpecificRecordObject.SCHEMA$))=3B So my question is what kind of schema I should use above to replay the TYPE= .STRING? Here are some things I tried=2C and the error I got: 1) I tried with a Union Schema=2C with 3 Long Types and 4 String Types. It = does NOT work=2C as union cannot contain duplicate types.2) Then I think I = need to create an anonymous record schema=2C it should work for my case. So= here is what I do: First=2C in the code=2C add the schema definition: = String keySchema =3D "type........." // create a record schema with 3 lon= g types and 4 string types Then=2C generate the schema at runtime in my = code: AvroJob.setMapOutputSchema(conf=2C Pair.getPairSchema(new Schema.Pa= rser().parse(keySchema)=2C OneAvroSpecificRecordObject.SCHEMA$))=3B This = works fine for all my mapper stage=2C but in the reducer part=2C it failed = with the following error: java.lang.ClassCastException: org.apache.avro.g= eneric.GenericData$Record cannot be cast to PartitionKeyMy reducer likes th= is:myReducer implements Reducer=2C AvroValue< OneAv= roSpecificRecordObject >=2C NullWritable=2C NullWritable>It looks like if I= use anonymous record schema=2C it will use genericData$Record=2C which I c= annot cast to PartitionKey class I want.3) Then I think=2C do I have to gen= erate a specific PartitionKey object using a new avsc file? I can do that= =2C but the new object generated by Avro won't implements WritableComparabl= e=2C so I cannot use it as key of mapper. I wonder=2C if I want to use a custom key implements WritableComparable as = my mapper output key=2C what schema I should use in Avro? I searched the so= urce code of Avro=2C and didn't find any existing examples to demo this. Al= so on the web=2C not too many examples to talk about it. But for a lot of c= ases=2C we want our own custom Key Class implementation=2C to be used in MR= job. Does anyone know how to do the schema for this kind of class? Any exa= mples available? Thanks Yong = --_0790e9e9-be1e-4cba-9c3e-a9cada581451_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi=2C

Current= ly I have a MR job needs to use my own Key class to support 2nd sort in the= MR job.

The originally job is using Avro String t= ype as the mapper output like this format:

pu= blic class MyMapper extends MapReduceBase implements Mapper<=3BLongWritab= le=2C Text=2C AvroKey<=3BCharSequence>=3B=2C
 =3B  = =3B  =3B  =3B AvroValue<=3BOneAvroSpecificRecordObject>=3B>= =3B

Right now=2C I need to change the key fr= om Text to a custom Key object=2C as I need to control complex sorting orde= r and support 2nd sort in my MR job.

So I create a= CustomKeyObject (PartitionKey class)=2C which contains 3 Long values and 4= String values. This key class implements =3BWritableComparable and I a= lso have my =3BKeyComparator and KeyGroupComparator class implementatio= n ready.

So in this case=2C I want to change my ma= pper for the new format:

public class MyMappe= r extends MapReduceBase implements Mapper<=3BLongWritable=2C Text=2C Avro= Key<=3B =3BCustomKeyObject =3B>=3B=2C
 =3B  = =3B  =3B  =3B AvroValue<=3BOneAvroSpecificRecordObject>=3B>= =3B

Here comes the problem=2C I don't know w= hat kind of schema I can use in my driver class for this key.
Originally=2C the driver will have following line:
AvroJob.setMapOutputSchema(conf=2C Pair.getPairSchema(Schema.c= reate(Schema.Type.STRING)=2C =3BOneAvroSpecificRecordObject.SCHEMA$))= =3B

So my question is what kind of schema I should= use above to replay the TYPE.STRING?

Here are som= e things I tried=2C and the error I got:

1) I trie= d with a Union Schema=2C with 3 Long Types and 4 String Types. It does NOT = work=2C as union cannot contain duplicate types.
2) Then I think = I need to create an anonymous record schema=2C it should work for my case. = So here is what I do:
 =3B  =3B First=2C in the code=2C a= dd the schema definition:
 =3B  =3B =3BString keySche= ma =3D "type........." // create a record schema with 3 long types and 4 st= ring types
 =3B  =3B Then=2C generate the schema at runti= me in my code:
 =3B  =3BAvroJob.setMapOutputSchema(conf= =2C Pair.getPairSchema(new Schema.Parser().parse(keySchema)=2C =3BOneAvroSpecificRecordObject.SCHEMA$)= )=3B
 =3B  =3BT= his works fine for all my mapper stage=2C but in the reducer part=2C it fai= led with the following error:
 =3B  =3Bjava.lang.ClassCastException: org.apache.avr= o.generic.GenericData$Record cannot be cast to PartitionKey
My re= ducer likes this:
myReducer implements Reducer<=3BAvroKey<=3B=  =3BPartitionKey =3B>=3B=2C AvroValue<=3B =3BOneAvroSpecifi= cRecordObject =3B>=3B=2C NullWritable=2C NullWritable>=3B
It looks like if I use anonymous record schema=2C it will use genericData$= Record=2C which I cannot cast to PartitionKey class I want.
3) Th= en I think=2C do I have to generate a specific PartitionKey object using a = new avsc file? I can do that=2C but the new object generated by Avro won't = implements =3BWritableComparable=2C = so I cannot use it as key of mapper.

I wo= nder=2C if I want to use a custom key implements =3BWritableComparable as my mapper output key=2C what = schema I should use in Avro? I searched the source code of Avro=2C and didn= 't find any existing examples to demo this. Also on the web=2C not too many= examples to talk about it. But for a lot of cases=2C we want our own custo= m Key Class implementation=2C to be used in MR job. Does anyone know how to= do the schema for this kind of class? Any examples available?
=

Thanks

Yong
= --_0790e9e9-be1e-4cba-9c3e-a9cada581451_--