accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: how to maintain versioning in D4M schema?
Date Mon, 30 Nov 2015 18:58:48 GMT
I can think of two options:

1. Instead of "field|value", use "field<version>|value", where version
behaves similarly to Accumulo's timestamp field, and add a custom iterator
which achieves the same effect as the VersioningIterator using this part of
the colq.

2. Instead of putting each "value" in its own field, you could combine them
into an ordered set: field|{time1:value1,time2:value2,time3:value3}. For
this to work well, you'd have to write a custom combining iterator that
kept only the most recent 3 during scans and compactions, based on time (or
whatever you use to denote version).

Of the two, I think the second is simpler and fits best within the existing
D4M schema. At the most, it just adds some structure to the value, which
can be processed with an additional combining iterator, but doesn't
fundamentally change the the table structure.

On Sun, Nov 29, 2015 at 11:10 PM shweta.agrawal <shweta.agrawal@orkash.com>
wrote:

> The example which I am working is:
>
> rowid        colf          colq          value
>    id                        field|value1      1
>    id                        field|value2      1
>    id                        field|value3      1
>    id                        field|value4      1
>    id                        field|value5      1
>    id                        field|value6      1
>
> This is my schema in D4M style. Here one field has multiple values. And
> I want to keep latest 3 values and I want that automatically other
> values to be deleted as in case of versioning iterator.
>
> So after versioning my table should look like this:
>
> rowid        colf          colq          value
>    id                        field|value1      1
>    id                        field|value2      1
>    id                        field|value3      1
>
> Thanks
> Shweta
>
> On Friday 27 November 2015 07:15 PM, Jeremy Kepner wrote:
> > Can you provide a made up specific example?  I think that will
> > make the discussion easier.
> >
> >
> > On Fri, Nov 27, 2015 at 02:46:33PM +0530, shweta.agrawal wrote:
> >> Thanks for the answer.
> >> But I am asking about versioning in D4M style. How can I use
> >> versioning iterator in D4M style as in D4M style, in Rowid id is
> >> strored and field|value is stored in ColumnQualifier. So as value is
> >> stored in columnQualifier I cannot maintain versions through
> >> versioning iterator. So I am asking how will I maintain versioning
> >> in D4M style?
> >>
> >> Thanks
> >> Shweta
> >>
> >> On Friday 27 November 2015 12:45 PM, Dylan Hutchison wrote:
> >>> In order to store five versions of a key but return only one of
> >>> them during a scan, set the minc and majc VersioningIterator to 5
> >>> and set the scan VersioningIterator to 1.  You can set scanning
> >>> iterators on a per-scan basis if this helps.
> >>>
> >>> It is not necessary to put the timestamp in the column family if
> >>> you are going with the VersioningIterator approach.
> >>>
> >>> There are many ways to achieve versioning in Accumulo. As the
> >>> designer/programmer, you must choose one that fits your
> >>> application, of which we do not know the full details. It sounds
> >>> like you have narrowed your choice to (1) putting the timestamp in
> >>> the column family, or (2) not putting the timestamp anywhere else
> >>> but instead changing the VersioningIterator such that Accumulo
> >>> stores more versions than the latest version of a
> >>> (row,colfam,colqual,colvis) key.
> >>>
> >>>
> >>>
> >>> On Thu, Nov 26, 2015 at 8:45 PM, mohit.kaushik
> >>> <mohit.kaushik@orkash.com <mailto:mohit.kaushik@orkash.com>>
> >>> wrote:
> >>>
> >>>     David,
> >>>
> >>>     But this is the case when we store versions based on timestamp
> >>>     field. The point is, in D4M schema we can not achieve it by doing
> >>>     this. In this case we are considering CF to store timestamp in
> >>>     reverse order as described by Dylan. Then how can we configure
> >>>     Accumulo to return only latest version and store only 5 versions?
> >>>
> >>>     Thanks
> >>>     Mohit Kaushik
> >>>
> >>>     On 11/27/2015 09:54 AM, David Medinets wrote:
> >>>>      From the user manual:
> >>>>
> >>>>     user@myinstance  mytable>  config  -t  mytable  -s
> table.iterator.scan.vers.opt.maxVersions=5
> >>>>     user@myinstance  mytable>  config  -t  mytable  -s
> table.iterator.minc.vers.opt.maxVersions=5
> >>>>     user@myinstance  mytable>  config  -t  mytable  -s
> table.iterator.majc.vers.opt.maxVersions=5
> >>>>
> >>>>     On Thu, Nov 26, 2015 at 11:10 PM, shweta.agrawal
> >>>>     <shweta.agrawal@orkash.com <mailto:shweta.agrawal@orkash.com>>
> wrote:
> >>>>
> >>>>         I want to maintain 5 versions only and user can enter any
> >>>>         number of versions but I want to keep only 5 latest version.
> >>>>
> >>>>
> >>>>         On Friday 27 November 2015 09:38 AM, David Medinets wrote:
> >>>>>         Do you want five versions of every entry or will the number
> >>>>>         of versions vary?
> >>>>>
> >>>>>         On Thu, Nov 26, 2015 at 10:53 PM, shweta.agrawal
> >>>>>         <shweta.agrawal@orkash.com
> >>>>>         <mailto:shweta.agrawal@orkash.com>> wrote:
> >>>>>
> >>>>>             Thanks Dylan and David.
> >>>>>             I can store version information in column family. But
my
> >>>>>             problem is when I have many versions of the same key
how
> >>>>>             will I manage that. In Accumulo versioning I can specify
> >>>>>             that how many versions I want to manage.
> >>>>>
> >>>>>             Suppose I have 10 versions and I only want 5 versions
to
> >>>>>             store, how to manage this in a big table?
> >>>>>
> >>>>>             Thanks
> >>>>>             Shweta
> >>>>>
> >>>>>             On Thursday 26 November 2015 10:22 PM, David Medinets
> wrote:
> >>>>>>             What are the query patterns? If you are versioning
for
> >>>>>>             auditing then changing the VersioningIterator seems
the
> >>>>>>             easiest approach. You could also store
> >>>>>>             application-specific version information in the
column
> >>>>>>             family. One of the reasons that D4M does not use
it is
> >>>>>>             to allow application-specific uses. Using the CF
means
> >>>>>>             that any applications that understand D4M would
not
> >>>>>>             need to change their queries to adjust for the version
> >>>>>>             information.
> >>>>>>
> >>>>>>             On Thu, Nov 26, 2015 at 4:26 AM, shweta.agrawal
> >>>>>>             <shweta.agrawal@orkash.com
> >>>>>>             <mailto:shweta.agrawal@orkash.com>> wrote:
> >>>>>>
> >>>>>>                 Hi,
> >>>>>>
> >>>>>>                 I have my data stored in D4M style. I also want
to
> >>>>>>                 maintain versions of different value on the
basis
> >>>>>>                 of time.  As in D4M style  data is only in rowid
> >>>>>>                 and colQualifier only.
> >>>>>>
> >>>>>>                 Is there any way to achieve versioning in D4M
> schema?
> >>>>>>
> >>>>>>                 Thanks
> >>>>>>                 Shweta
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>     --
> >>>
> >>>     *Mohit Kaushik*
> >>>     Software Engineer
> >>>     A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
> >>>     *Tel:*+91 (124) 4969352 <tel:%2B91%20%28124%29%204969352> |
> >>>     *Fax:*+91 (124) 4033553 <tel:%2B91%20%28124%29%204033553>
> >>>
> >>>     <http://politicomapper.orkash.com>interactive social intelligence
> >>>     at work...
> >>>
> >>>     <https://www.facebook.com/Orkash2012>
> >>>     <http://www.linkedin.com/company/orkash-services-private-limited>
> >>>     <https://twitter.com/Orkash> <http://www.orkash.com/blog/>
> >>>     <http://www.orkash.com>
> >>>     <http://www.orkash.com> ... ensuring Assurance in complexity and
> >>>     uncertainty
> >>>
> >>>     /This message including the attachments, if any, is a confidential
> >>>     business communication. If you are not the intended recipient it
> >>>     may be unlawful for you to read, copy, distribute, disclose or
> >>>     otherwise use the information in this e-mail. If you have received
> >>>     it in error or are not the intended recipient, please destroy it
> >>>     and notify the sender immediately. Thank you /
> >>>
> >>>
>
>

Mime
View raw message