Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5928D7065 for ; Tue, 4 Oct 2011 21:58:24 +0000 (UTC) Received: (qmail 63710 invoked by uid 500); 4 Oct 2011 21:58:22 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 63682 invoked by uid 500); 4 Oct 2011 21:58:22 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 63674 invoked by uid 99); 4 Oct 2011 21:58:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2011 21:58:22 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.213.169 as permitted sender) Received: from [209.85.213.169] (HELO mail-yx0-f169.google.com) (209.85.213.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2011 21:58:16 +0000 Received: by yxi19 with SMTP id 19so1269878yxi.14 for ; Tue, 04 Oct 2011 14:57:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=q7FQCUC6efZLpPMFOqL/I1EWYA23dlru4y2najm7p90=; b=KtSmauC8FsL2Sb7y7+4mwI4uWrgHVM4byRQK2v9eBg/HRquOpr0EafpvPda/GuO6gC Pz9kC4eSedYwyLSvm4ZENuUZmbXO6k6EKkOt62OBBDBMIqOvRFsGUC8B7YKJp2pV652W zgus/o23e2sV0loUCozEjSVNH9ZeKqM0o6sUo= MIME-Version: 1.0 Received: by 10.100.131.18 with SMTP id e18mr1523724and.14.1317765475809; Tue, 04 Oct 2011 14:57:55 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.101.99.10 with HTTP; Tue, 4 Oct 2011 14:57:55 -0700 (PDT) In-Reply-To: <4E8B14DC.2040703@gmail.com> References: <4E86F745.1090205@gmail.com> <4E875676.6060704@gmail.com> <4E8B14DC.2040703@gmail.com> Date: Tue, 4 Oct 2011 14:57:55 -0700 X-Google-Sender-Auth: eR9G3kDePYWoMNtW8EyaF7O5swc Message-ID: Subject: Re: question about writing to columns with lots of versions in map task From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Maybe try a different schema yeah (hard to help without knowing exactly how you end up overwriting the same triples all the time tho). Setting timestamps yourself is usually bad yes. J-D On Tue, Oct 4, 2011 at 7:14 AM, Christopher Dorner wrote: > Why do you advise against setting timestamps by oneself? Is it generally = not > a good practice? > > If i do not want to insert anymore data later, then it shouldn't be a > problem. Of course i probably will have trouble if i want to insert > something later (e.g. from another file, then the byte offset could be > exactly the same and again overwrite my data). I didn't think about that > yet. > > The thing is, that i do not want to loose data while inserting and i need= to > insert all of them. Maybe i could consider some different schema. > > I will try it with a reduce step, but i am pretty sure i will again have > some loss of data. > > Thank you, > > Christopher > > > Am 03.10.2011 20:31, schrieb Jean-Daniel Cryans: >> >> I would advise against setting the timestamps yourself and instead >> reduce in order to prune the versions you don't need to insert in >> HBase. >> >> J-D >> >> On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner >> =A0wrote: >>> >>> Hi again, >>> >>> i think i solved my issue. >>> >>> I simply use the byte offset of the row currently read by the Mapper as >>> the >>> timestamp for the Put. This is unique for my input file, which contains >>> one >>> triple for each row. So the timestamps are unique. >>> >>> Regards, >>> Christopher >>> >>> >>> Am 01.10.2011 13:19, schrieb Christopher Dorner: >>>> >>>> Hallo, >>>> >>>> I am reading a File containing RDF triples in a Map-job. the RDF tripl= es >>>> then are stored in a table, where columns can have lots of versions. >>>> So i need to store many values for one rowKey in the same column. >>>> >>>> I made the observation, that reading the file is very fast and thus so= me >>>> values are put into the table with the same timestamp and therefore >>>> overriding an existing value. >>>> >>>> How can i avoid that? The timestamps are not necessary for later usage= . >>>> >>>> Could i simply use some sort of custom counter? >>>> >>>> How would that work in fully distributed mode? I am working on >>>> pseudo-distributed-mode for testing purpose right now. >>>> >>>> Thank You and Regards, >>>> Christopher >>> >>> > >