Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of standingbear@gmail.com
 designates 72.14.220.156 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=qupeoFD5lU49m71fHzvUM1o6U+lN9ZWDprPlb/7wkjjDj0PY/k9CZtQkjZJ2N34g7+lSBWzUdAgdbkM2VXDfa7tnHrCqFLPX5lWrBqKEKYldSooWy8/a0kdUR4QCf9U/EanTVBenVx8+xKeDWMQ2Y2UNPtXPV/pfV8jXCSZpTeI=
Message-ID: <bb7e77f10805272002q716cf61ak2f27b057e212f9ff@mail.gmail.com>
Date: Tue, 27 May 2008 23:02:56 -0400
From: "Jim the Standing Bear" <standingbear@gmail.com>
To: core-user@hadoop.apache.org
Subject: Re: How to make a lucene Document hadoop Writable?
In-Reply-To: <483CC87C.70901@apache.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <bb7e77f10805271919o292705deq26f9776e81695cc0@mail.gmail.com>
	 <483CC87C.70901@apache.org>

Thanks for the quick response, Dennis.  However, your code snippet was
about how to serialize/deserialize using
ObjectInputStream/ObjectOutputStream.  Maybe it was my fault for not
making the question clear enough - I was wondering if and how I can
serialize/deserialize using only DataInput and DataOutput.

This is because the Writable Interface defined by Hadoop has the
following two methods:

void 	readFields(DataInput in)
          Deserialize the fields of this object from in.
void 	write(DataOutput out)
          Serialize the fields of this object to out

so I must start with DataInput and DataOutput, and work my way to
ObjectInputStream and ObjectOutputStream.  Yet I have not found a way
to go from DataInput to ObjectInputStream.  Any ideas?

-- Jim


On Tue, May 27, 2008 at 10:50 PM, Dennis Kubes <kubes@apache.org> wrote:
> You can use something like the code below to go back and forth from
> serializables.  The problem with lucene documents is that fields which are
> not stored will be lost during the serialization / deserialization process.
>
> Dennis
>
> public static Object toObject(byte[] bytes, int start)
>  throws IOException, ClassNotFoundException {
>
>  if (bytes == null || bytes.length == 0 || start >= bytes.length) {
>    return null;
>  }
>
>  ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
>  bais.skip(start);
>  ObjectInputStream ois = new ObjectInputStream(bais);
>
>  Object bObject = ois.readObject();
>
>  bais.close();
>  ois.close();
>
>  return bObject;
> }
>
> public static byte[] fromObject(Serializable toBytes)
>  throws IOException {
>
>  ByteArrayOutputStream baos = new ByteArrayOutputStream();
>  ObjectOutputStream oos = new ObjectOutputStream(baos);
>
>  oos.writeObject(toBytes);
>  oos.flush();
>
>  byte[] objBytes = baos.toByteArray();
>
>  baos.close();
>  oos.close();
>
>  return objBytes;
> }
>
>
> Jim the Standing Bear wrote:
>>
>> Hello,
>>
>> I am not sure if this is a genuine hadoop question or more towards a
>> core-java question.  I am hoping to create a wrapper over Lucene
>> Document, so that this wrapper can be used for the value field of a
>> Hadoop SequenceFile, and therefore, this wrapper must also implement
>> the Writable interface.
>>
>> Lucene's Document is already made serializable, which is quite nice.
>> However, the Writable interface definition gives only DataInput and
>> DataOutput, and I am having a hard time trying to figure out how to
>> serialize/deserialize an lucene Document object using
>> DataInput/DataOutput.  In other words, how do I go from DataInput to
>> ObjectInputStream, or from DataOutput to ObjectOutputStream?  Thanks.
>>
>> -- Jim
>


-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------