hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim the Standing Bear" <standingb...@gmail.com>
Subject Re: How to make a lucene Document hadoop Writable?
Date Wed, 28 May 2008 03:02:56 GMT
Thanks for the quick response, Dennis.  However, your code snippet was
about how to serialize/deserialize using
ObjectInputStream/ObjectOutputStream.  Maybe it was my fault for not
making the question clear enough - I was wondering if and how I can
serialize/deserialize using only DataInput and DataOutput.

This is because the Writable Interface defined by Hadoop has the
following two methods:

void 	readFields(DataInput in)
          Deserialize the fields of this object from in.
void 	write(DataOutput out)
          Serialize the fields of this object to out

so I must start with DataInput and DataOutput, and work my way to
ObjectInputStream and ObjectOutputStream.  Yet I have not found a way
to go from DataInput to ObjectInputStream.  Any ideas?

-- Jim




On Tue, May 27, 2008 at 10:50 PM, Dennis Kubes <kubes@apache.org> wrote:
> You can use something like the code below to go back and forth from
> serializables.  The problem with lucene documents is that fields which are
> not stored will be lost during the serialization / deserialization process.
>
> Dennis
>
> public static Object toObject(byte[] bytes, int start)
>  throws IOException, ClassNotFoundException {
>
>  if (bytes == null || bytes.length == 0 || start >= bytes.length) {
>    return null;
>  }
>
>  ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
>  bais.skip(start);
>  ObjectInputStream ois = new ObjectInputStream(bais);
>
>  Object bObject = ois.readObject();
>
>  bais.close();
>  ois.close();
>
>  return bObject;
> }
>
> public static byte[] fromObject(Serializable toBytes)
>  throws IOException {
>
>  ByteArrayOutputStream baos = new ByteArrayOutputStream();
>  ObjectOutputStream oos = new ObjectOutputStream(baos);
>
>  oos.writeObject(toBytes);
>  oos.flush();
>
>  byte[] objBytes = baos.toByteArray();
>
>  baos.close();
>  oos.close();
>
>  return objBytes;
> }
>
>
> Jim the Standing Bear wrote:
>>
>> Hello,
>>
>> I am not sure if this is a genuine hadoop question or more towards a
>> core-java question.  I am hoping to create a wrapper over Lucene
>> Document, so that this wrapper can be used for the value field of a
>> Hadoop SequenceFile, and therefore, this wrapper must also implement
>> the Writable interface.
>>
>> Lucene's Document is already made serializable, which is quite nice.
>> However, the Writable interface definition gives only DataInput and
>> DataOutput, and I am having a hard time trying to figure out how to
>> serialize/deserialize an lucene Document object using
>> DataInput/DataOutput.  In other words, how do I go from DataInput to
>> ObjectInputStream, or from DataOutput to ObjectOutputStream?  Thanks.
>>
>> -- Jim
>



-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

Mime
View raw message