Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52AD81167D for ; Sun, 12 May 2013 20:41:27 +0000 (UTC) Received: (qmail 86384 invoked by uid 500); 12 May 2013 20:41:22 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 86196 invoked by uid 500); 12 May 2013 20:41:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 86164 invoked by uid 99); 12 May 2013 20:41:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 May 2013 20:41:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dechouxb@gmail.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 May 2013 20:41:16 +0000 Received: by mail-lb0-f176.google.com with SMTP id v20so5683726lbc.21 for ; Sun, 12 May 2013 13:40:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=KumrXut4lsow1vwlE+hBMhb+TRAszpDvM33nrxOyL7A=; b=gdlQ1D201ewXATbreHdf4n58zXKjBah2ZXHMybhSsZ521xWDNk0ESjmMvS0TPLwvWa L/Z3DIPQNMllto5iBpeznZzr7/no5u7+mNoRB1UlpWKJg89puCidJkkhdiq4JS/cCUme JgCz3dyO8ukP5MFvGeNIGwg46V8tt0dLv1Gi4ty22CIImJEzIqfNB+B3SLKYIVA0rYN4 K2ISn+OATadSfpGnmSpOzMwo+r6ziirH+ILpgsxdwP3rAKBayqbcXEUMkhWFpSV98RvZ nafrgSBuo/e98A7MFLgVYVf7N27nAenUzndVVcpYhqiA4pIrPSXeoBzYkEWP/IlMWJRu j4yw== MIME-Version: 1.0 X-Received: by 10.112.155.202 with SMTP id vy10mr11637855lbb.51.1368391255802; Sun, 12 May 2013 13:40:55 -0700 (PDT) Received: by 10.112.150.198 with HTTP; Sun, 12 May 2013 13:40:55 -0700 (PDT) In-Reply-To: References: Date: Sun, 12 May 2013 22:40:55 +0200 Message-ID: Subject: Re: Wrapping around BitSet with the Writable interface From: Bertrand Dechoux To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=089e01228d68b8046f04dc8b6af3 X-Virus-Checked: Checked by ClamAV on apache.org --089e01228d68b8046f04dc8b6af3 Content-Type: text/plain; charset=ISO-8859-1 You can disregard my links as their are only valid for java 1.7+. The JavaSerialization might clean your code but shouldn't bring a significant boost in performance. The EWAH implementation has, at least, the methods you are looking for : serialize / deserialize. Regards Bertrand Note to myself : I have to remember this one. On Sun, May 12, 2013 at 10:27 PM, Ted Dunning wrote: > Another interesting alternative is the EWAH implementation of java bitsets > that allow efficient compressed bitsets with very fast OR operations. > > https://github.com/lemire/javaewah > > See also https://code.google.com/p/sparsebitmap/ by the same authors. > > > On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux wrote: > >> In order to make the code more readable, you could start by using the >> methods toByteArray() and valueOf(bytes) >> >> >> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29 >> >> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29 >> >> Regards >> >> Bertrand >> >> >> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky wrote: >> >>> I have large java.util.BitSet objects that I want to bitwise-OR using a >>> MapReduce job. I decided to wrap around each object using the Writable >>> interface. Right now I convert each BitSet to a byte array and serialize >>> the byte array on disk. >>> >>> Converting them to byte arrays is a bit inefficient but I could not find >>> a work around to write them directly to the DataOutput. Is there a way to >>> skip this and serialize the object directly? Here is what my current >>> implementation looks like: >>> >>> public class BitSetWritable implements Writable { >>> >>> private BitSet bs; >>> >>> public BitSetWritable() { >>> this.bs = new BitSet(); >>> } >>> >>> @Override >>> public void write(DataOutput out) throws IOException { >>> >>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8); >>> ObjectOutputStream oos = new ObjectOutputStream(bos); >>> oos.writeObject(bs); >>> byte[] bytes = bos.toByteArray(); >>> oos.close(); >>> out.writeInt(bytes.length); >>> out.write(bytes); >>> >>> } >>> >>> @Override >>> public void readFields(DataInput in) throws IOException { >>> >>> int len = in.readInt(); >>> byte[] bytes = new byte[len]; >>> in.readFully(bytes); >>> >>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes); >>> ObjectInputStream ois = new ObjectInputStream(bis); >>> try { >>> bs = (BitSet) ois.readObject(); >>> } catch (ClassNotFoundException e) { >>> throw new IOException(e); >>> } >>> >>> ois.close(); >>> } >>> >>> } >>> >> >> >> >> -- >> Bertrand Dechoux >> > > -- Bertrand Dechoux --089e01228d68b8046f04dc8b6af3 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
You can disregard my links as their are only valid fo= r java 1.7+.
The JavaSerialization might clean your code but shouldn'= ;t bring a significant boost in performance.
The EWAH implem= entation has, at least, the methods you are looking for : serialize / deser= ialize.

Regards

Bertrand

<= div>Note to myself : I have to remember this one.


On Sun, May 12, 2013 at= 10:27 PM, Ted Dunning <tdunning@maprtech.com> wrote:
= Another interesting alternative is the EWAH implementation of java bitsets = that allow efficient compressed bitsets with very fast OR operations.


See also=A0https://code.google= .com/p/sparsebitmap/ by the same authors.


On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de= chouxb@gmail.com> wrote:
In order to make the code more = readable, you could start by using the methods toByteArray() and valueOf(by= tes)

http://docs.oracle.com/javase/7/doc= s/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/= api/java/util/BitSet.html#valueOf%28byte[]%29

Regards

Bertrand


On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <jim.twensky@gmail.com= > wrote:
I have large java.util.BitSet objects tha= t I want to bitwise-OR using a MapReduce job. I decided to wrap around each= object using the Writable interface. Right now I convert each BitSet to a = byte array and serialize the byte array on disk.

Converting them to byte arrays is a bit inefficient but I could not fin= d a work around to write them directly to the DataOutput. Is there a way to= skip this and serialize the object directly? Here is what my current imple= mentation looks like:

public class BitSetWritable implements Writable {
=A0
=A0 privat= e BitSet bs;
=A0
=A0 public BitSetWritable() {
=A0=A0=A0 this.bs =3D new BitSet();
=A0 = }

=A0 @Override
=A0 public void write(DataOutput out) throws IOException {
=A0=A0=A0
=A0=A0=A0 ByteArrayOutputStream bos =3D new ByteArrayOutputSt= ream(bs.size()/8);
=A0=A0=A0 ObjectOutputStream oos =3D new ObjectOutput= Stream(bos);
=A0=A0=A0 oos.writeObject(bs);
=A0=A0=A0 byte[] bytes = =3D bos.toByteArray();
=A0=A0=A0 oos.close();=A0=A0=A0
=A0=A0=A0 out.writeInt(bytes.length);
=A0=A0=A0 out.write(bytes);=A0=A0= =A0
=A0=A0=A0
=A0 }

=A0 @Override
=A0 public void readFie= lds(DataInput in) throws IOException {
=A0=A0=A0
=A0=A0=A0 int len = =3D in.readInt();
=A0=A0=A0 byte[] bytes =3D new byte[len];
=A0=A0=A0 in.readFully(bytes);
=A0=A0=A0
=A0=A0=A0 ByteArrayInputStr= eam bis =3D new ByteArrayInputStream(bytes);
=A0=A0=A0 ObjectInputStream= ois =3D new ObjectInputStream(bis);
=A0=A0=A0 try {
=A0=A0=A0=A0=A0 = bs =3D (BitSet) ois.readObject();
=A0=A0=A0 } catch (ClassNotFoundExcept= ion e) {
=A0=A0=A0=A0=A0 throw new IOException(e);
=A0=A0=A0 }
=A0=A0=A0
= =A0=A0=A0 ois.close();
=A0 }

}



--
Bertrand Dechoux




--
Bertrand Dechoux
--089e01228d68b8046f04dc8b6af3--