avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raihan Jamal <jamalrai...@gmail.com>
Subject Re: Deserialize the attributes data using another schema give me wrong results
Date Thu, 26 Sep 2013 07:33:01 GMT
@Erin/Doug/Mika... Any thoughts on my previous question?
Thanks for the help....




*Raihan Jamal*


On Wed, Sep 25, 2013 at 5:42 PM, Raihan Jamal <jamalraihan@gmail.com> wrote:

> Thanks Eric. Now I have couple of questions on this-
>
> 1) So that means we cannot deserialize any attributes data using any other
> schema? We always need to pass the schema that we have used for writing
> along with any other schema that I want to use for reading purpose? Is that
> right?
> 2) Is there any way, I can deserialize any attributes data using any other
> schema without passing actual schema that we have to serialize?
>
> In my example if you see, I am already storing schemaId in the avro schema
> that will map to some actual schema name. So while serializing any
> attributes data, we will also store the schemaId within that avro binary
> encoded value, and that schemaId will represent this is the schema we have
> used to serialize it. Now while deserializing that attributes, firstly we
> will grab the schemaId (by deserializing it with another schema) and see
> which schema we have used actually to serialize that attributes and then we
> will deserialize that attributes again using the actual schema...
>
>
>
>
>
>
> *Raihan Jamal*
>
>
> On Wed, Sep 25, 2013 at 5:30 PM, Eric Wasserman <ewasserman@247-inc.com>wrote:
>
>>  Short answer. Use this constructor instead:
>>
>>  /** Construct given writer's and reader's schema. */
>>
>>   public GenericDatumReader(Schema writer, Schema reader) {
>>
>>  Longer answer:
>>
>>  You have to give the GenericDatumReader the EXACT schema that wrote the
>> bytes that you are trying to parse ("writer's schema").
>> You can *also* give it another schema you'd like to use ("reader's
>> schema") that can be different.
>>
>>
>>  Try changing this line of your code:
>>
>>  GenericDatumReader<GenericRecord> r1 = new
>> GenericDatumReader<GenericRecord>(schema1);
>>
>>  To this:
>>
>>  GenericDatumReader<GenericRecord> r1 = new
>> GenericDatumReader<GenericRecord>(schema2, schema1); // writer's schema is
>> "schema2", reader's schema is "schema1"
>>
>>
>>  ------------------------------
>> *From:* Raihan Jamal <jamalraihan@gmail.com>
>> *Sent:* Wednesday, September 25, 2013 5:10 PM
>> *To:* user@avro.apache.org
>> *Subject:* Deserialize the attributes data using another schema give me
>> wrong results
>>
>>   I am trying to serialize one of our Attributes Daya using Apache Avro
>> Schema. Here the attribute name is `e7` and the schema that I am using to
>> serialize it is `schema2.avsc` which is below.
>>
>>      {
>>      "namespace": "com.avro.test.AvroExperiment",
>>      "type": "record",
>>      "name": "DEMOGRAPHIC",
>>      "doc": "DEMOGRAPHIC data",
>>         "fields": [
>>             {"name": "dob", "type": "string"},
>>             {"name": "gndr",  "type": "string"},
>>             {"name": "occupation", "type": "string"},
>>     {"name": "mrtlStatus", "type": "string"},
>>     {"name": "numChldrn", "type": "int"},
>>     {"name": "estInc", "type": "string"},
>>     {"name": "schemaId", "type": "int"},
>>     {"name": "lmd", "type": "long"}
>>         ]
>>     }
>>
>>  Below is the code that I am using to serialize the attribute `e7` using
>> above avro `schema2.avsc`. And I am able to serialize it properly and it
>> works fine...
>>  Schema schema = new
>> Parser().parse((AvroExperiment.class.getResourceAsStream("/schema2.avsc")));
>> GenericRecord record = new GenericData.Record(schema);
>> record.put("dob", "161913600000");
>> record.put("gndr", "f");
>> record.put("occupation", "doctor");
>> record.put("mrtlStatus", "single");
>> record.put("numChldrn", 3);
>> record.put("estInc", "50000");
>> record.put("schemaId", 20001);
>> record.put("lmd", 1379814280254L);
>>
>>  GenericDatumWriter<GenericRecord> writer = new
>> GenericDatumWriter<GenericRecord>(schema);
>> ByteArrayOutputStream os = new ByteArrayOutputStream();
>>
>>  Encoder e = EncoderFactory.get().binaryEncoder(os, null);
>>
>>  writer.write(record, e);
>> e.flush();
>> byte[] byteData = os.toByteArray();
>> os.close();
>>
>>  Now, I tried deserializing the same `e7` attributes data using the same
>> above avro schema definition `schema2.avsc` and it also works fine, and I
>> am able to deserialize it properly.
>>  GenericDatumReader<GenericRecord> r = new
>> GenericDatumReader<GenericRecord>(schema);
>> BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(byteData,
>> null);
>> GenericRecord result = r.read(null, decoder);
>>
>>  System.out.println(result);
>> System.out.println(result.get("schemaId"));
>> System.out.println(result.get("lmd"));
>>
>>
>>  Now I thought, lets deserialize the same attributes data using another
>> avro schema that I have which is `schema1.avsc` and just extract only
>> `schemaId` and `lmd` from that. Below is the schema-
>>
>>      {
>>      "namespace": "com.avro.test.AvroExperiment",
>>      "type": "record",
>>      "name": "DEMOGRAPHIC",
>>      "doc": "DEMOGRAPHIC data",
>>         "fields": [
>>     {"name": "schemaId", "type": "int"},
>>     {"name": "lmd", "type": "long"}
>>         ]
>>     }
>>  /**
>> * Deserialize the same byte data using another Avro Schema
>> */
>>
>>  Schema schema1 = new
>> Parser().parse((AvroExperiment.class.getResourceAsStream("/schema1.avsc")));
>>
>>  GenericDatumReader<GenericRecord> r1 = new
>> GenericDatumReader<GenericRecord>(schema1);
>> BinaryDecoder decoder1 = DecoderFactory.get().binaryDecoder(byteData,
>> null);
>> GenericRecord result1 = r1.read(null, decoder1);
>>
>>  System.out.println(result1);
>> System.out.println(result1.get("schemaId"));
>> System.out.println(result1.get("lmd"));
>>  But somehow the above code prints out like this which is wrong... I am
>> not sure what wrong I did?
>>
>> {"schemaId": 12, "lmd": -25}
>>         12
>>          -25
>>  It should be printing out like this....
>>
>>      {"schemaId": 20001, "lmd": 1379814280254L}
>>     20001
>>     1379814280254L
>>
>>  Can anyone help me what wrong I did?
>>
>
>

Mime
View raw message