incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Dinu <jond...@gmail.com>
Subject Re: PTypes and parallel do
Date Wed, 01 Aug 2012 04:29:58 GMT
Thanks! this helps tremendously, and now I have a much better idea of implementing some more
custom types and collections :)
 
On Jul 31, 2012, at 6:17 PM, Josh Wills wrote:

> Hey Jonathan,
> 
> With respect to handling arrays (as opposed to Collection<T)>s), there
> are lots of different serialization approaches you could take
> depending on what you are planning to do with the output, and so my
> general inclination has been to not impose a particular approach on
> people within the core PTypeFamily stuff. That may or may not be the
> right solution, and it's certainly something we need feedback on.
> 
> My advice is to think about how you want to consume or output the
> array and then use the _derived_ method on the PTypeFamily of your
> choice to convert from some base format (like a String or a byte
> array) to the float array you want to work with. Here's some code that
> I just spun up that uses Avro's bytes() type as the base for working
> with a float array:
> 
>  private static final MapFn<ByteBuffer, float[]> IN = new
> MapFn<ByteBuffer, float[]>() {
>    @Override
>    public float[] map(ByteBuffer input) {
>      float[] f = new float[(input.limit() - input.position()) / 4];
>      input.asFloatBuffer().get(f);
>      return f;
>    }
>  };
>  private static final MapFn<float[], ByteBuffer> OUT = new
> MapFn<float[], ByteBuffer>() {
>    @Override
>    public ByteBuffer map(float[] input) {
>      byte[] bb = new byte[input.length * 4];
>      for (int i = 0; i < input.length; i++) {
>        int f = Float.floatToRawIntBits(input[i]);
>        bb[4*i] = (byte)((f >> 24) & 0xff);
>        bb[4*i + 1] = (byte)((f >> 16) & 0xff);
>        bb[4*i + 2] = (byte)((f >> 8) & 0xff);
>        bb[4*i + 3] = (byte)((f >> 0) & 0xff);
>      }
>      return ByteBuffer.wrap(bb);
>    }
>  };
> 
>  public static final PType<float[]> FLOAT_ARRAYS =
> Avros.derived(float[].class, IN, OUT, Avros.bytes());
> 
>  @Test
>  public void testInAndOut() throws Exception {
>    float[] data = new float[] { 17.29f, 2.0f, 3.14f };
>    float[] out = IN.map(OUT.map(data));
>    assertEquals(data.length, out.length);
>    for (int i = 0; i < out.length; i++) {
>      assertEquals(data[i], out[i], 0.0001f);
>    }
>  }
> 
> Best,
> Josh
> 
> On Tue, Jul 31, 2012 at 5:34 PM, Jonathan Dinu <jondinu@gmail.com> wrote:
>> Hi,
>> 
>> I am pretty new to Crunch and I am having a difficult time working around PTypes
specifically as the third argument to parallelDo when serializing the collection.
>> 
>> I have been using some of the PTypeFamily methods for primitives like Strings but
I am trying to create a PCollection output that contains Arrays of Floats.  What I want out
is PCollection<Float[]> but i cannot seem to coerce the right Ptype.  I am not sure
if this is the best approach or if I should be using a different type to store my records
in the collection.  Or if there is a way to register custom types and define how they should
be serialized to disk.
>> 
>> Any help is greatly appreciated.
>> 
>> Thanks,
>> Joanthan


Mime
View raw message