hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Q <bill.q....@gmail.com>
Subject Re: Storing data with long history of versions
Date Mon, 10 Nov 2014 15:45:07 GMT
Hi Ted,
Thanks a lot.

When would it break? Would you please give some details of why the size
would be a decision factor?

I will have probably 10 cells that have daily updates. And the rest cells
in the column family will only have a handful of versions. So, the cells in
the same column family will be very skewed in terms of version numbers.

And before a major compaction, if I try to grab the all the versions of the
cell, will there be any performance issue? I plan to do a batch process on
hundreds of thousands of devices with all the versions of that few cells
pulled out.

On Monday, November 10, 2014, Ted Yu <yuzhihong@gmail.com> wrote:

> Half a million timestamps with 20 bytes each cell equate to 10MB.
> That should be fine for your client.
>
> Cheers
>
> On Mon, Nov 10, 2014 at 7:23 AM, Bill Q <bill.q.hdp@gmail.com
> <javascript:;>> wrote:
>
> > Hi Ted,
> > Thanks a lot for the reply.
> >
> > For #1, the size for the value only will be around 20 bytes for each
> cell.
> > And there will be hundreds of thousands of time stamp per cell. But not
> > millions. Any suggestion?
> >
> > Many thanks.
> >
> >
> > Cao
> >
> > On Monday, November 10, 2014, Ted Yu <yuzhihong@gmail.com <javascript:;>>
> wrote:
> >
> > > For #1, what's the expected size of data you want to store ?
> > >
> > > For #2, the new data inserted under column:value with a newer timestamp
> > > would be stored in a different HFile. Old and new data would be
> > > consolidated after major compaction.
> > >
> > > Cheers
> > >
> > > On Mon, Nov 10, 2014 at 6:21 AM, Bill Q <bill.q.hdp@gmail.com
> <javascript:;>
> > > <javascript:;>> wrote:
> > >
> > > > Hi,
> > > > I am designing a schema to store time series data for each device.
> And
> > I
> > > > have a couple of questions that I am not quit sure.
> > > >
> > > > 1. *Is there any down side for storing the data in the same
> > > > columnfamily:column with a long history of customized timestamp? *
> > > >
> > > > For example, I have historical daily data for a device. I would like
> to
> > > use
> > > > only one column qualifier to store them with custom timestamp, which
> is
> > > the
> > > > date of the data was collected. So, when I query the data I can
> easily
> > > pull
> > > > all the timeseries data against this particular device in one scan.
> > > >
> > > > 2. *After a storefile is finalized and become immutable, what would
> > > happen
> > > > when someone updates the row? *
> > > >
> > > > For example, if I insert a new column:value with a newer timestamp
> into
> > > the
> > > > same row:columnfamily. Where is this new key/value part going to sit
> in
> > > the
> > > > HDFS? Is it close to the previous K/V pairs in the storefile?
> > > >
> > > >
> > > > Many thanks.
> > > >
> > > >
> > > > Bill
> > > >
> > >
> >
> >
> > --
> > Many thanks.
> >
> >
> > Bill
> >
>


-- 
Many thanks.


Bill

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message