Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A89F11376 for ; Sun, 12 May 2013 20:28:54 +0000 (UTC) Received: (qmail 60339 invoked by uid 500); 12 May 2013 20:28:49 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 60224 invoked by uid 500); 12 May 2013 20:28:49 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 60213 invoked by uid 99); 12 May 2013 20:28:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 May 2013 20:28:49 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.217.177] (HELO mail-lb0-f177.google.com) (209.85.217.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 May 2013 20:28:45 +0000 Received: by mail-lb0-f177.google.com with SMTP id 13so5627134lba.8 for ; Sun, 12 May 2013 13:28:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=rzfeCGGgffBNRidt5B5GIGKS0wry8XfA+lMHF1t5AoI=; b=ch4xVEPaBOURcuDsklZSx2ZN66a+0coFNcGdkNseXnZq/XttwQnHdVHDPHRTF3Thw8 8ozj7smS90KNwk5MjiEiZbJLLL7zHAi+j32b6wOD/h1IrwyfsBjPS6EL4ymQKjjLjHsG okr2mlq5dG82yLZ15pcaXo+vjJW0KZmgrdbZ5a3qzQeoyJkv28TLX12jmRYvTI8miiAk Snp2HyiycL70ed1pJ2SoA0XwOL16+H/3mUmYXSwl0feCcWa8SpbDO+1pqY7oQLwgwrV6 vwymbXZKmqGeG/nx7jOpu+5AyFZvIR6JVRffG1+ovlO6zNeYxuswG1x5ih5mqUm0Zu0w UlGw== X-Received: by 10.112.149.167 with SMTP id ub7mr4215395lbb.53.1368390483569; Sun, 12 May 2013 13:28:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.180.99 with HTTP; Sun, 12 May 2013 13:27:43 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Sun, 12 May 2013 13:27:43 -0700 Message-ID: Subject: Re: Wrapping around BitSet with the Writable interface To: "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b343bacb0021804dc8b3c6f X-Gm-Message-State: ALoCoQl85dxekGocdHk6ZzFmNzpBtNSSycSmWeFiKRD8iPoALPocdi47l0/pZzV5LOoaAOLTH5R3 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b343bacb0021804dc8b3c6f Content-Type: text/plain; charset=ISO-8859-1 Another interesting alternative is the EWAH implementation of java bitsets that allow efficient compressed bitsets with very fast OR operations. https://github.com/lemire/javaewah See also https://code.google.com/p/sparsebitmap/ by the same authors. On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux wrote: > In order to make the code more readable, you could start by using the > methods toByteArray() and valueOf(bytes) > > > http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29 > > http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29 > > Regards > > Bertrand > > > On Sun, May 12, 2013 at 8:24 PM, Jim Twensky wrote: > >> I have large java.util.BitSet objects that I want to bitwise-OR using a >> MapReduce job. I decided to wrap around each object using the Writable >> interface. Right now I convert each BitSet to a byte array and serialize >> the byte array on disk. >> >> Converting them to byte arrays is a bit inefficient but I could not find >> a work around to write them directly to the DataOutput. Is there a way to >> skip this and serialize the object directly? Here is what my current >> implementation looks like: >> >> public class BitSetWritable implements Writable { >> >> private BitSet bs; >> >> public BitSetWritable() { >> this.bs = new BitSet(); >> } >> >> @Override >> public void write(DataOutput out) throws IOException { >> >> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8); >> ObjectOutputStream oos = new ObjectOutputStream(bos); >> oos.writeObject(bs); >> byte[] bytes = bos.toByteArray(); >> oos.close(); >> out.writeInt(bytes.length); >> out.write(bytes); >> >> } >> >> @Override >> public void readFields(DataInput in) throws IOException { >> >> int len = in.readInt(); >> byte[] bytes = new byte[len]; >> in.readFully(bytes); >> >> ByteArrayInputStream bis = new ByteArrayInputStream(bytes); >> ObjectInputStream ois = new ObjectInputStream(bis); >> try { >> bs = (BitSet) ois.readObject(); >> } catch (ClassNotFoundException e) { >> throw new IOException(e); >> } >> >> ois.close(); >> } >> >> } >> > > > > -- > Bertrand Dechoux > --047d7b343bacb0021804dc8b3c6f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Another interesting alternative= is the EWAH implementation of java bitsets that allow efficient compressed= bitsets with very fast OR operations.

=

<= /div>
See also=A0https://code.google.com/p/sparsebitmap/ by th= e same authors.


On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de= chouxb@gmail.com> wrote:
In order to make the code more = readable, you could start by using the methods toByteArray() and valueOf(by= tes)

http://docs.oracle.com/javase/7/doc= s/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/= api/java/util/BitSet.html#valueOf%28byte[]%29

Regards

Bertrand


On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <jim.twe= nsky@gmail.com> wrote:
I have large java.util.BitSet objects tha= t I want to bitwise-OR using a MapReduce job. I decided to wrap around each= object using the Writable interface. Right now I convert each BitSet to a = byte array and serialize the byte array on disk.

Converting them to byte arrays is a bit inefficient but I could not fin= d a work around to write them directly to the DataOutput. Is there a way to= skip this and serialize the object directly? Here is what my current imple= mentation looks like:

public class BitSetWritable implements Writable {
=A0
=A0 privat= e BitSet bs;
=A0
=A0 public BitSetWritable() {
=A0=A0=A0 this.bs =3D new BitSet();
=A0 = }

=A0 @Override
=A0 public void write(DataOutput out) throws IOException {
=A0=A0=A0
=A0=A0=A0 ByteArrayOutputStream bos =3D new ByteArrayOutputSt= ream(bs.size()/8);
=A0=A0=A0 ObjectOutputStream oos =3D new ObjectOutput= Stream(bos);
=A0=A0=A0 oos.writeObject(bs);
=A0=A0=A0 byte[] bytes = =3D bos.toByteArray();
=A0=A0=A0 oos.close();=A0=A0=A0
=A0=A0=A0 out.writeInt(bytes.length);
=A0=A0=A0 out.write(bytes);=A0=A0= =A0
=A0=A0=A0
=A0 }

=A0 @Override
=A0 public void readFie= lds(DataInput in) throws IOException {
=A0=A0=A0
=A0=A0=A0 int len = =3D in.readInt();
=A0=A0=A0 byte[] bytes =3D new byte[len];
=A0=A0=A0 in.readFully(bytes);
=A0=A0=A0
=A0=A0=A0 ByteArrayInputStr= eam bis =3D new ByteArrayInputStream(bytes);
=A0=A0=A0 ObjectInputStream= ois =3D new ObjectInputStream(bis);
=A0=A0=A0 try {
=A0=A0=A0=A0=A0 = bs =3D (BitSet) ois.readObject();
=A0=A0=A0 } catch (ClassNotFoundExcept= ion e) {
=A0=A0=A0=A0=A0 throw new IOException(e);
=A0=A0=A0 }
=A0=A0=A0
= =A0=A0=A0 ois.close();
=A0 }

}



<= font color=3D"#888888">--
Bertrand Dechoux

--047d7b343bacb0021804dc8b3c6f--