avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From felix gao <gre1...@gmail.com>
Subject Re: Java programmatic generate record for Type.Union
Date Mon, 25 Jul 2011 19:11:24 GMT
ignore the previous one.  accidentally hit send before complete the
message.

I am trying to produce some avro file based on a TSV file.  We had an
> original schema which is defined like
> {   "type": "record",
>     "name": "accessLog",
>     "namespace": "avro_access_log",
>     "fields": [
>       {"name": "SquidIP" , "type": "string" },
>       {"name": "Timestamp" , "type": "long"  },
>       {"name": "Hostname", "type": "string" },
>     ]
> }
>
> now that we have added additional fields, I would like to change my new
> schema to
>
> {   "type": "record",
>     "name": "accessLog",
>     "namespace": "avro_access_log",
>     "fields": [
>       {"name": "SquidIP" , "type": "string" },
>       {"name": "Timestamp" , "type": "long"  },
>       {"name": "Hostname", "type": "string" },
>      {"name": "ClientIP", "type": ["string", "null"]   }
>     ]
> }
>
> if i understand correctly the last field should be type Union,  and below
> is my code that generate the record.   What I would like to know is how to
> return the correct Union type when I call Object value =
> ConvertFieldToType(getColumnType(col), v, col);  the convertFieldToType
> simply convert a string to a long if the type is long.  what should be the
> corrected value to return for getColumnType for ClientIP field in my
> example?
>
>
>
>     public static Object generateDatumBasedOnSchema(Schema schema, String
> line, Map<String, Integer> badConversions){
>         GenericRecord record = new GenericData.Record(schema);
>         int fieldLength = schema.getFields().size();
>         int col =0;
>         String[] fields = line.trim().split("\t");
>         while(col < fieldLength){
>             try{
>                 String name = getColumnName(col);
>                 String v = "-";
>                 try{
>                     v = fields[col];
>                 }catch(ArrayIndexOutOfBoundsException e){
>                     if (alertedAIOOBE < 5){
>                         System.err.println("index "+col+" is not in
> fields");
>                     }
>                     alertedAIOOBE++;
>                     return null;
>                 }
>                 Object value = ConvertFieldToType(getColumnType(col), v,
> col);
>                 record.put(name, value);
>                 col++;
>             }catch(NullPointerException npe){ //this is threw when there is
> no matching name for the column which indicates our schema is older than the
> data.
>                 System.err.println("Schema: "+schema.toString()+" does not
> match line "+line);
>                 return null;
>             }
>             catch(RuntimeException re){
>                 System.err.println("Unknown option at "+col);
>                 return null;
>             }
>             catch(Exception e){
>                 e.printStackTrace();
>                 return null;
>             }
>
>         }
>         return record;
>     }
>

Thanks,

Felix

Mime
View raw message