hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: question about writing to columns with lots of versions in map task
Date Tue, 04 Oct 2011 21:57:55 GMT
Maybe try a different schema yeah (hard to help without knowing
exactly how you end up overwriting the same triples all the time tho).

Setting timestamps yourself is usually bad yes.

J-D

On Tue, Oct 4, 2011 at 7:14 AM, Christopher Dorner
<christopher.dorner@gmail.com> wrote:
> Why do you advise against setting timestamps by oneself? Is it generally not
> a good practice?
>
> If i do not want to insert anymore data later, then it shouldn't be a
> problem. Of course i probably will have trouble if i want to insert
> something later (e.g. from another file, then the byte offset could be
> exactly the same and again overwrite my data). I didn't think about that
> yet.
>
> The thing is, that i do not want to loose data while inserting and i need to
> insert all of them. Maybe i could consider some different schema.
>
> I will try it with a reduce step, but i am pretty sure i will again have
> some loss of data.
>
> Thank you,
>
> Christopher
>
>
> Am 03.10.2011 20:31, schrieb Jean-Daniel Cryans:
>>
>> I would advise against setting the timestamps yourself and instead
>> reduce in order to prune the versions you don't need to insert in
>> HBase.
>>
>> J-D
>>
>> On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner
>> <christopher.dorner@gmail.com>  wrote:
>>>
>>> Hi again,
>>>
>>> i think i solved my issue.
>>>
>>> I simply use the byte offset of the row currently read by the Mapper as
>>> the
>>> timestamp for the Put. This is unique for my input file, which contains
>>> one
>>> triple for each row. So the timestamps are unique.
>>>
>>> Regards,
>>> Christopher
>>>
>>>
>>> Am 01.10.2011 13:19, schrieb Christopher Dorner:
>>>>
>>>> Hallo,
>>>>
>>>> I am reading a File containing RDF triples in a Map-job. the RDF triples
>>>> then are stored in a table, where columns can have lots of versions.
>>>> So i need to store many values for one rowKey in the same column.
>>>>
>>>> I made the observation, that reading the file is very fast and thus some
>>>> values are put into the table with the same timestamp and therefore
>>>> overriding an existing value.
>>>>
>>>> How can i avoid that? The timestamps are not necessary for later usage.
>>>>
>>>> Could i simply use some sort of custom counter?
>>>>
>>>> How would that work in fully distributed mode? I am working on
>>>> pseudo-distributed-mode for testing purpose right now.
>>>>
>>>> Thank You and Regards,
>>>> Christopher
>>>
>>>
>
>

Mime
View raw message