hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Dorner <christopher.dor...@gmail.com>
Subject Re: question about writing to columns with lots of versions in map task
Date Tue, 04 Oct 2011 14:14:52 GMT
Why do you advise against setting timestamps by oneself? Is it generally 
not a good practice?

If i do not want to insert anymore data later, then it shouldn't be a 
problem. Of course i probably will have trouble if i want to insert 
something later (e.g. from another file, then the byte offset could be 
exactly the same and again overwrite my data). I didn't think about that 

The thing is, that i do not want to loose data while inserting and i 
need to insert all of them. Maybe i could consider some different schema.

I will try it with a reduce step, but i am pretty sure i will again have 
some loss of data.

Thank you,


Am 03.10.2011 20:31, schrieb Jean-Daniel Cryans:
> I would advise against setting the timestamps yourself and instead
> reduce in order to prune the versions you don't need to insert in
> HBase.
> J-D
> On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner
> <christopher.dorner@gmail.com>  wrote:
>> Hi again,
>> i think i solved my issue.
>> I simply use the byte offset of the row currently read by the Mapper as the
>> timestamp for the Put. This is unique for my input file, which contains one
>> triple for each row. So the timestamps are unique.
>> Regards,
>> Christopher
>> Am 01.10.2011 13:19, schrieb Christopher Dorner:
>>> Hallo,
>>> I am reading a File containing RDF triples in a Map-job. the RDF triples
>>> then are stored in a table, where columns can have lots of versions.
>>> So i need to store many values for one rowKey in the same column.
>>> I made the observation, that reading the file is very fast and thus some
>>> values are put into the table with the same timestamp and therefore
>>> overriding an existing value.
>>> How can i avoid that? The timestamps are not necessary for later usage.
>>> Could i simply use some sort of custom counter?
>>> How would that work in fully distributed mode? I am working on
>>> pseudo-distributed-mode for testing purpose right now.
>>> Thank You and Regards,
>>> Christopher

View raw message