hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Baff <Aaron.B...@telescope.tv>
Subject Custom Key class not working correctly
Date Fri, 10 Sep 2010 18:19:58 GMT
So I'm pretty new to Hadoop, just learning it for work, and starting to play with some of our
data on a VM cluster to see it work, and to make sure it can do what we need to. By and large,
very cool, I think I'm getting the hang of it, but when I try and make a custom composite
key class, it doesn't seem to correctly group the data correctly.

The data is a bunch of phone numbers with various transactional data (timestamp, phone type,
other call data). My Mapper is pretty much just taking the data, and splitting it out into
a custom Key (or Text with just the phone number) and custom Value to hold the rest of the
data.

In my reducer, I'm counting the number of unique phone numbers among other things using a
Reporter counter. Using my key class (code below), I get a total of 56,404 unique numbers
which is way too low. When I use just the phone number (using Text) as the key, it gives me
1,159,558 which is correct. In my custom class hashCode() method I'm just using the String.hashCode()
for the String holding the phone number.

That seemed reasonable to me, since I wanted it to group the values by the phone number, and
then order by the timestamp which is what I'm doing in the compareTo() function.


============================================================================================

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;

public class AIMdnTimeKey implements WritableComparable {
    String mdn = "";
    long timestamp = -1L;
    private byte oli = 0;

    public AIMdnTimeKey() {
    }

    public AIMdnTimeKey( String initMdn, long initTimestamp) {
        mdn = initMdn;
        timestamp = initTimestamp;
    }

    public void setMdn( String newMdn ) {
        mdn = newMdn;
    }

    public String getMdn() {
        return mdn;
    }

    public void setTimestamp( long newTimestamp ) {
        timestamp = newTimestamp;
    }

    public long getTimestamp() {
        return timestamp;
    }

    public void write(DataOutput out) throws IOException {
        out.writeUTF(mdn);
        out.writeByte(oli);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        mdn = in.readUTF();
        oli = in.readByte();
        timestamp = in.readLong();
    }

    public int compareTo(Object obj) throws ClassCastException {
        if (obj == null) {
            throw new ClassCastException("Object is NULL and so cannot be compared!");
        }
        if (getClass() != obj.getClass()) {
            throw new ClassCastException("Object is of type " + obj.getClass().getName() +
" which cannot be compared to this class of type " + getClass().getName());
        }
        final AIMdnTimeKey other = (AIMdnTimeKey) obj;

        return (int)(this.timestamp - other.timestamp);
    }

    @Override
    public int hashCode() {

        return mdn.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (getClass() != obj.getClass()) {
            return false;
        }
        final AIMdnTimeKey other = (AIMdnTimeKey) obj;
        if ((this.mdn == null) ? (other.mdn != null) : !this.mdn.equals(other.mdn)) {
            return false;
        }
        return true;
    }

    @Override
    public String toString() {
        return mdn + " " + timestamp;
    }

    /**
     * @return the oli
     */
    public byte getOli() {
        return oli;
    }

    /**
     * @param oli the oli to set
     */
    public void setOli(byte oli) {
        this.oli = oli;
    }
}

============================================================================================



Aaron Baff | Developer | Telescope, Inc.

email:  aaron.baff@telescope.tv<mailto:aaron.baff@telescope.tv> | office:  424 270 2913
| www.telescope.tv<http://www.telescope.tv/>

The information contained in this email is confidential and may be legally privileged. It
is intended solely for the addressee. Access to this email by anyone else is unauthorized.
If you are not the intended recipient, any disclosure, copying, distribution or any action
taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any views
expressed in this message are those of the individual and may not necessarily reflect the
views of Telescope Inc. or its associated companies.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message