hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@apache.org>
Subject Re: Data serialization doesn't seem to respect MAX_VERSIONS
Date Mon, 15 Sep 2008 16:47:41 GMT
Adal,

For small tables being used with a lot of updates,
HBASE-871<https://issues.apache.org/jira/browse/HBASE-871>was created
(but not really documented outside of the code). I think I will
blog on this.

Thx for reporting this issue.

J-D

On Mon, Sep 15, 2008 at 12:03 PM, Adal Chiriliuc <adalc@adobe.com> wrote:

> Hello,
>
> We've been inserting data into Hbase and we found out that the size of the
> files on local disk/HDFS is much larger than expected.
>
> So I made a small script which updates over Thrift the same row many times.
> The table was created with MAX_VERSIONS = 1.
>
> This is what I found:
>
> If I modify the same cell 100.000 times, the final region "data" file on
> disk contains around 50.000 of those modifications after I shutdown Hbase.
>
> If I modify the same cell 200.000 times, the final region "data" file on
> disk contains around 100.000 of those modifications after I shutdown Hbase.
>
> client = thrift_util.create_client(Hbase.Client, "localhost", 9090, 30.0)
> cd = ColumnDescriptor()
> cd.name = "test:"
> cd.maxVersions = 1
> client.createTable("bug_test", [cd])
>
> for i in range(100000):
>                mutation = Mutation()
>                mutation.column = "test:column"
>                mutation.value = "version_%d" % i
>                client.mutateRow("bug_test", "single_row", [mutation])
>                if i % 1000 == 0:
>                                print i
>
> Is this expected behavior? Our use case involves multiple updates of the
> same cell using big blobs of data (25 KB).
>
> Note: when getting a cell/scanning the table, everything is ok, only the
> last inserted version of the cell is returned. The older values of the cell
> are only present in the storage files.
>
> Best regards,
> Adal
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message