hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: DISCUSS : HFile V3 proposal for tags in 0.96
Date Fri, 19 Jul 2013 05:00:16 GMT
bq. By default code will go with V2.

Good.

Looking forward to the patch.

On Thu, Jul 18, 2013 at 9:57 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> >>Any consideration that the tags are serialized before the memstoreTS
> instead of after ?
> The argument is basically simple like memstoreTS is optional and that comes
> only in HFile and not in KV.  The tags are as part of the current design
> comes after Value in the KV structure.  Hence the same would be better to
> be applied on HFiles also.
> >>When would PrefixTree be able to handle tags ?
> May be my stmt confused you.  Pls see the point on PrefixTreeEncoders in
> the previous mail.  I meant that as per the current design PrefixKey,
> DiffKey, FastDiff extend BufferedDataEncoders and hence
> BufferedDataEncoders are made tag aware.
>
> PrefixTreecodec has been handled separately to make it work with tags.
> >> Put in another way, after this feature goes in, would
> HFile V3 always be written ?
> By default code will go with V2. So when user says he needs V3 he would
> need to update the hfile.format.version to 3.  This would ensure that the
> system uses V3.
>
> Thanks Ted.
>
> Regards
> Ram
>
>
> On Fri, Jul 19, 2013 at 10:10 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > bq. V3 would now serailize the tags also after the Value part before the
> > memstoreTS
> >
> > Any consideration that the tags are serialized before the memstoreTS
> > instead of after ?
> >
> > bq. The BuffereddataEncoder, being the base class for all encoders other
> > than PrefixTree would now be tag aware.
> >
> > When would PrefixTree be able to handle tags ?
> >
> > When a new HFile is opened, would user be able to specify that there is
> no
> > tagging involved ? Put in another way, after this feature goes in, would
> > HFile V3 always be written ?
> >
> > Thanks
> >
> > On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > What changes/differences that we would be introducing in the V3 format
> > > would be (I will put down in words under subcategory)
> > >
> > > To reduce the code duplicate we would subclass ReaderV3 and WriterV3
> from
> > > ReaderV2 and WriterV2 respectively.
> > > *HFileBlockFormat*
> > > *=============*
> > > No change in V2 and V3.
> > >
> > > *KV serialization*
> > > *============*
> > > V2 no change
> > > V3 would now serailize the tags also after the Value part before the
> > > memstoreTS
> > >
> > > *FixedFileTrailer*
> > > *===========*
> > > Introduces a new information into the trailer which can be used in V3
> to
> > > make tags optional.  Suppose take the case that user selects V3 but in
> > one
> > > CF there are no tags.  Then we would write the tag bytes while flushing
> > but
> > > during compaction using this header info we would just avoid writing
> tags
> > > in the compacted files.  This would mean no impact on read performances
> > > after the compaction has been completed.
> > > V2 would code also tries to get this trailer info but this being null
> no
> > > impact on any of the existing code.
> > >
> > > *WriterV3 and ReaderV3*
> > > *=================*
> > > Tries to handle the tags based on the meta data from the trailer info.
> >  All
> > > the apis like seekTo, next(), getKeyValue() are now able to handle tags
> > > based on the flag passed during the construction of the Readers and
> > > Writers.  We can be sure that for any instances of V2 the includeTags
> > flag
> > > would always be false.
> > >
> > > *DataBlockEncoders*
> > > *==============*
> > > Additonal arguments added to the apis in the interfaces related to
> > > HFileDataBlockEncoders, BufferedDataBlockEncoders,
> > > HFileDataBlockEncodingContext etc.  Again for V2 the new apis would
> still
> > > behave the same way and there would be no impact for V2 based usecases.
> > > The BuffereddataEncoder, being the base class for all encoders other
> than
> > > PrefixTree would now be tag aware.
> > >
> > > *PrefixTreeEncoders*
> > > *==============*
> > > Trying to keep changes minimal here but would ensure that there are no
> > > behaviourial changes while using PrefixTree with V2.
> > >
> > > *KeyValue class*
> > > *===========*
> > > Wil include changes to have a Tag class inside this.  Apis to identify
> > tags
> > > in a KV would be needed.  Util method changes also would be there.
> > >
> > > For V2 based read/write flow the existing code path applies with
> > no/minimal
> > > changes.
> > >
> > > Many testcases has to be changed to accomodate the api changes
> happening
> > to
> > > the internal interfaces.
> > > I have listed down the changes at a high level, may be once you could
> > see a
> > > patch that would give more clarity. Let me know if further information
> > > would be needed.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jxiang@cloudera.com>
> > wrote:
> > >
> > > > Can you share some more details about it?  A graph/chart/table
> showing
> > > the
> > > > specific difference will be helpful.
> > > >
> > > > Thanks,
> > > > Jimmy
> > > >
> > > >
> > > > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > >
> > > > > I have been following comments on HBASE-8496.
> > > > >
> > > > > I think introducing cell tagging through HFile v3 is acceptable.
> > > > >
> > > > > Looking forward to seeing your implementation.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >
> > > > > > For the past couple of months, we have been working through
> various
> > > > > > prototypes for supporting inline storage of tags in cells as
> > > persisted
> > > > on
> > > > > > disk. Our goals are to support optional use of tags with minimal
> > > > changes
> > > > > to
> > > > > > core code while also avoiding performance impacts to users who
do
> > not
> > > > use
> > > > > > tags.
> > > > > >
> > > > > >  For background, refer to the comments in
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > > > > >
> > > > > > and
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > > >
> > > > > >  We have iterated on a couple of prototypes that implement tag
> > > > awareness
> > > > > in
> > > > > > DataBlockEncoders, later as a new type of Codec for Cells. This
> > point
> > > > is
> > > > > > discussed in the above comments in HBASE-8496.
> > > > > >
> > > > > > We think that tag awareness in Cell Codecs is the right way,
but
> > > there
> > > > > are
> > > > > > some shortcomings with the current interfaces internal to HFile
> > that
> > > > need
> > > > > > to addressed in order to avoid any performance impacts for those
> > who
> > > do
> > > > > not
> > > > > > want to use inline tags, and that may involve a drastic amount
of
> > > code
> > > > > > change.
> > > > > >
> > > > > >  We can avoid several problems with HFile V2 internals, and
> > backwards
> > > > > > compatibility concerns, and allow for working tags support with
> no
> > > > > > performance impact and low risk to all HBase users who do not
> want
> > > tag
> > > > > > support, while still allowing for inline tags capabilities in
a
> > > > shipping
> > > > > > version of HBase, by introducing this in a new V3 version for
> > HFile.
> > > > > >
> > > > > >  The new V3 version for HFile differs from earlier versions
by
> > > > supporting
> > > > > > inline tag storage.  This version does not change the HFileBlock
> > > format
> > > > > > whereas it just serializes and deserializes the Tag information
> > that
> > > > > would
> > > > > > be persisted in the HFile. Having HFile V3 would also help to
> keep
> > > Tags
> > > > > > optional such that the existing cases where there are no tags
are
> > > > totally
> > > > > > unaffected.  Also we ensure that we keep the changes outside
of
> the
> > > V3
> > > > > > reader and writer minimal.  Compatibility would not be a problem
> > with
> > > > > > future versions when we go with Cell Codecs.  What Codecs used
> for
> > > > > writing
> > > > > > the file will be persisted in the HFile header.  Now for files
> that
> > > are
> > > > > > either V2 or V3 we will instantiate two default codecs that
know
> to
> > > > deal
> > > > > > with serializations with and without tags.
> > > > > >
> > > > > >  There have been thoughts on an HFile V3 prior, e.g.:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > > >
> > > > > >  We have been working on this and will have a clean patch with
> good
> > > > > amount
> > > > > > of testing in time for 0.96.
> > > > > >
> > > > > > Although our focus is on performance-neutral persistence of
> inline
> > > cell
> > > > > > tags in 0.96 to enable a couple of security coprocessor users,
> > > > > introducing
> > > > > > an HFile V3 provides design freedom for some other features
and
> > > > problems
> > > > > > too that can be developed through the 0.96 cycle into 0.98.
> > > > > >
> > > > > > Pls voice your opinion on this so that we can make this clear
and
> > may
> > > > be
> > > > > > define the scope of the patch.  Also feel free to comment on
> > > HBASE-8496
> > > > > on
> > > > > > your thoughts and ideas.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Ram
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message