avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <scottca...@apache.org>
Subject Re: Java programmatic generate record for Type.Union
Date Mon, 08 Aug 2011 22:15:28 GMT
The below avro type for ClientIP is a nullable string.  Therefore the type
returned is most likely a String.  The type that Avro (in recent versions)
will return when reading will be a CharSequence that may be null, and it
will accept null or any CharSequence (such as String or Utf8) for writing.

On 7/25/11 12:11 PM, "felix gao" <gre1600@gmail.com> wrote:

> ignore the previous one.  accidentally hit send before complete the message.
> 
>> I am trying to produce some avro file based on a TSV file.  We had an
>> original schema which is defined like
>> {   "type": "record",
>>     "name": "accessLog",
>>     "namespace": "avro_access_log",
>>     "fields": [
>>       {"name": "SquidIP" , "type": "string" },
>>       {"name": "Timestamp" , "type": "long"  },
>>       {"name": "Hostname", "type": "string" },
>>     ]
>> }
>> 
>> now that we have added additional fields, I would like to change my new
>> schema to
>> 
>> {   "type": "record",
>>     "name": "accessLog",
>>     "namespace": "avro_access_log",
>>     "fields": [
>>       {"name": "SquidIP" , "type": "string" },
>>       {"name": "Timestamp" , "type": "long"  },
>>       {"name": "Hostname", "type": "string" },
>>      {"name": "ClientIP", "type": ["string", "null"]   }
>>     ]
>> }
>> 
>> if i understand correctly the last field should be type Union,  and below is
>> my code that generate the record.   What I would like to know is how to
>> return the correct Union type when I call Object value =
>> ConvertFieldToType(getColumnType(col), v, col);  the convertFieldToType
>> simply convert a string to a long if the type is long.  what should be the
>> corrected value to return for getColumnType for ClientIP field in my example?
>> 
>> 
>> 
>>     public static Object generateDatumBasedOnSchema(Schema schema, String
>> line, Map<String, Integer> badConversions){
>>         GenericRecord record = new GenericData.Record(schema);
>>         int fieldLength = schema.getFields().size();
>>         int col =0;
>>         String[] fields = line.trim().split("\t");
>>         while(col < fieldLength){
>>             try{
>>                 String name = getColumnName(col);
>>                 String v = "-";
>>                 try{
>>                     v = fields[col];
>>                 }catch(ArrayIndexOutOfBoundsException e){
>>                     if (alertedAIOOBE < 5){
>>                         System.err.println("index "+col+" is not in fields");
>>                     }
>>                     alertedAIOOBE++;
>>                     return null;
>>                 }
>>                 Object value = ConvertFieldToType(getColumnType(col), v,
>> col);
>>                 record.put(name, value);
>>                 col++;
>>             }catch(NullPointerException npe){ //this is threw when there is
>> no matching name for the column which indicates our schema is older than the
>> data.
>>                 System.err.println("Schema: "+schema.toString()+" does not
>> match line "+line);
>>                 return null;
>>             }
>>             catch(RuntimeException re){
>>                 System.err.println("Unknown option at "+col);
>>                 return null;
>>             }
>>             catch(Exception e){
>>                 e.printStackTrace();
>>                 return null;
>>             }
>>                 
>>         }
>>         return record;
>>     }
> 
> Thanks,
> 
> Felix
> 



Mime
View raw message