hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Barat <vincent.ba...@ubikod.com>
Subject Re: LZO vs GZIP vs NO COMPREESSION: why is GZIP the winner ???
Date Thu, 25 Feb 2010 10:49:25 GMT
Unfortunately I can post only some snapshots.

I have no region split (I insert just 100000 rows so there is no 
split, except when I don't use compression).

I use HBase 0.20.2 and to insert I use the HTable.put(list<Put>);

The only difference between my 3 tests is the way I create the test 
table:

HBaseAdmin admin = new HBaseAdmin(config);

HTableDescriptor desc = new HTableDescriptor(name);

HColumnDescriptor colDesc;

colDesc = new HColumnDescriptor(Bytes.toBytes("meta:"));
colDesc.setMaxVersions(1);
colDesc.setCompressionType(Algorithm.GZ); <- LZO or NONE
desc.addFamily(colDesc);

colDesc = new HColumnDescriptor(Bytes.toBytes("data:"));
colDesc.setMaxVersions(1);
colDesc.setCompressionType(Algorithm.GZ); <- LZO or NONE
desc.addFamily(colDesc);

admin.createTable(desc);

A typical row inserted is made of 13 columns with a short content, 
as show here:

1264761195240/6ffc3fe659023 column=data:accuracy, 
timestamp=1267006115356, value=1317
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:alt, 
timestamp=1267006115356, value=0
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:country, 
timestamp=1267006115356, value=France
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:countrycode, 
timestamp=1267006115356, value=FR
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:lat, 
timestamp=1267006115356, value=48.65869706
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:locality, 
timestamp=1267006115356, value=Morsang-sur-Orge
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:lon, 
timestamp=1267006115356, value=2.36138182
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:postalcode, 
timestamp=1267006115356, value=91390
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=data:region, 
timestamp=1267006115356, value=Ile-de-France
  a3c9cfed0a50a9f199ed42f2730 

  1264761195240/6ffc3fe659023 column=meta:imei, 
timestamp=1267006115356, value=6ffc3fe659023a3c9cfed0a50a9f199e
  a3c9cfed0a50a9f199ed42f2730 d42f2730 

  1264761195240/6ffc3fe659023 column=meta:infoid, 
timestamp=1267006115356, value=ca30781e0c375a1236afbf323cbfa4
  a3c9cfed0a50a9f199ed42f2730 0dc2c7c7af 

  1264761195240/6ffc3fe659023 column=meta:locid, 
timestamp=1267006115356, value=5e15a0281e83cfe55ec1c362f84a39f
  a3c9cfed0a50a9f199ed42f2730 006f18128 

  1264761195240/6ffc3fe659023 column=meta:timestamp, 
timestamp=1267006115356, value=1264761195240
  a3c9cfed0a50a9f199ed42f2730 


Maybe LZO works much better with fewer rows with bigger content?

Le 24/02/10 19:10, Jean-Daniel Cryans a écrit :
> Are you able to post the code used for the insertion? It could be
> something with your usage pattern or something wrong with the code
> itself.
>
> How many rows are you inserting? Do you even have some region splits?
>
> J-D
>
> On Wed, Feb 24, 2010 at 1:52 AM, Vincent Barat<vincent.barat@ubikod.com>  wrote:
>> Yes of course.
>>
>> We use a 4 machine cluster (4 large instances on AWS): 8 GB RAM each, dual
>> core CPU. 1 is for the Hadoop and HBase namenode / masters, and 3 are
>> hosting the datanode / regionservers.
>>
>> The table used for testing is first created, then I insert sequentially a
>> set of rows and count the nb of rows inserted by second.
>>
>> I insert rows by set of 1000 (using HTable.put(list<Put>);
>>
>> When reading, I read also sequentially by using a scanner (scanner caching
>> is set to 1024 rows).
>>
>> Maybe our installation of LZO is not good ?
>>
>>
>> Le 23/02/10 22:15, Jean-Daniel Cryans a écrit :
>>>
>>> Vincent,
>>>
>>> I don't expect that either, can you give us more info about your test
>>> environment?
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Tue, Feb 23, 2010 at 10:39 AM, Vincent Barat
>>> <vincent.barat@ubikod.com>    wrote:
>>>>
>>>> Hello,
>>>>
>>>> I did some testing to figure out which compression algo I should use for
>>>> my
>>>> HBase tables. I thought that LZO was the good candidate, but it appears
>>>> that
>>>> it is the worst one.
>>>>
>>>> I uses one table with 2 families and 10 columns. Each row has a total of
>>>> 200
>>>> to 400 bytes.
>>>>
>>>> Here is my results:
>>>>
>>>> GZIP:           2600 to 3200 inserts/s  12000 to 15000 reads/s
>>>> NO COMPRESSION: 2000 to 2600 inserts/s  4900 to 5020 reads/s
>>>> LZO             1600 to 2100 inserts/s  4020 to 4600 reads/s
>>>>
>>>> Do you have an explanation to this ? I though that the LZO compression
>>>> was
>>>> always faster at compression and decompression than GZIP ?
>>>>
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message