avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saptarshi Guha <sg...@mozilla.com>
Subject Pointers towards improving the code
Date Wed, 11 Jul 2012 18:27:24 GMT
Hello,

Hoping if this is the best way to write things. At the bottom is the avro file.
Note it can have recursive entries (i.e the 'list' field of 'c').
To serialize a vector of integers i call serialize_using_avro.
Note, 'wrtr' passed into serialize_using_avro has pre-created rvalue corresponding to the
schema.
The actual data is serialized to bytes in the call _serialize_using_avro.

Serializing 1MM vectors of length 10 (e.g. [1,2,...,10]) takes about 10s on my Air.

Is there a scope for improvement?

I dont think i can pre-create field_{a,c,v} and branch (in _serialize_using_avro), because
when TYPEOF(rob) is a 'list',
i'll be calling _serialize_using_avro on a freshly minted rvalue.

Thanks in advance
Saptarshi


C Source
void _serialize_using_avro(avro_value_t *rvalue,SEXP robj){
  avro_value_t field_a;
  avro_value_t field_c;
  avro_value_t field_v;
  avro_value_t branch;
  avro_value_reset(rvalue);
  // Attributes, Null for now
  avro_value_get_by_index(rvalue, 0, &field_a, NULL);
  avro_value_set_branch(&field_a,0,&branch);
  avro_value_set_null(&branch);

  avro_value_get_by_index(rvalue, 1, &field_c, NULL);
  switch(TYPEOF(robj)){
  case NILSXP:
    {
      avro_value_set_branch(&field_c,0,&branch);
      avro_value_set_null(&branch);
      break;
    } 
  case INTSXP:
    {
      avro_value_set_branch(&field_c,1,&branch);
      avro_value_get_by_index(&branch,0,&field_v,NULL);
      for(int i=0;i < LENGTH(robj); i++){
	avro_value_t element;
	size_t new_index;
	avro_value_append(&field_v, &element, &new_index);
	avro_value_set_int(&element,INTEGER(robj)[i]);
      }
      break;
    }
  }
}

SEXP serialize_using_avro(SEXP wrtr, SEXP robj){
  avro_robj *avror = (avro_robj*) R_ExternalPtrAddr(wrtr);
  _serialize_using_avro(&(avror->rvalue),robj);
  size_t index;
  SEXP r= R_NilValue;
  avro_value_sizeof(&(avror->rvalue), &index);
  PROTECT(r=Rf_allocVector(RAWSXP,index));
  avro_writer_memory_set_dest(avror->writer,RAW(r),LENGTH(r));
  avro_value_write(avror-> writer, &(avror->rvalue) );
  UNPROTECT(1);
  return(r);
}



Avro File
---------
{
    "namespace": "robjects.avro",
    "type": "record",
    "name": "robject",
    "doc" : "Encoding of some of the R data types",
    "fields": [
	{"name":"a" ,"type":["null",{"type":"map"  ,"values":"robject"}],"comments":"Attributes"},
	{"name":"c","type":[
	    "null",
	    {"name":"int", "type":"record" ,"fields":[{"name":"v","type":{"type":"array" ,"items":"int"}}]},
	    {"name":"real","type":"record" ,"fields":[{"name":"v","type":{"type":"array" ,"items":"double"}}]},
	    {"name":"raw", "type":"record" ,"fields":[{"name":"v","type":{"type":"array" ,"items":"bytes"}}]},
	    {"name":"list","type":"record" ,"fields":[{"name":"v","type":{"type":"array" ,"items":"robject"}}]}
	    ]
	 ,"comments": "Content"}
    ]
}

Mime
View raw message