hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Alenius <>
Subject Partitioning strings for bucketed table
Date Sat, 11 Aug 2012 17:16:22 GMT

I'm trying create an external bucketed table but I'm having trouble
recreating the behavior of the hive partitioner used to create
internal bucketed tables.

My bucket key is a String s. Currently in my partitioner I'm using the
follow code which is based on my findings in the Hive codebase:

  (s.hashCode() & Integer.MAX_VALUE) % numPartitions;

Unfortunately, when I do a select count(*) with TABLESAMPLE about 1%
of the rows are missing from those coming into the mapper.

I suspect that I might need wrap my String in a Writable before
calling hashCode(). Does anyone know exactly how to partition the data
so that it becomes compatible with hive bucketing?



View raw message