hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: DISCUSS : HFile V3 proposal for tags in 0.96
Date Fri, 19 Jul 2013 04:40:12 GMT
bq. V3 would now serailize the tags also after the Value part before the
memstoreTS

Any consideration that the tags are serialized before the memstoreTS
instead of after ?

bq. The BuffereddataEncoder, being the base class for all encoders other
than PrefixTree would now be tag aware.

When would PrefixTree be able to handle tags ?

When a new HFile is opened, would user be able to specify that there is no
tagging involved ? Put in another way, after this feature goes in, would
HFile V3 always be written ?

Thanks

On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> What changes/differences that we would be introducing in the V3 format
> would be (I will put down in words under subcategory)
>
> To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from
> ReaderV2 and WriterV2 respectively.
> *HFileBlockFormat*
> *=============*
> No change in V2 and V3.
>
> *KV serialization*
> *============*
> V2 no change
> V3 would now serailize the tags also after the Value part before the
> memstoreTS
>
> *FixedFileTrailer*
> *===========*
> Introduces a new information into the trailer which can be used in V3 to
> make tags optional.  Suppose take the case that user selects V3 but in one
> CF there are no tags.  Then we would write the tag bytes while flushing but
> during compaction using this header info we would just avoid writing tags
> in the compacted files.  This would mean no impact on read performances
> after the compaction has been completed.
> V2 would code also tries to get this trailer info but this being null no
> impact on any of the existing code.
>
> *WriterV3 and ReaderV3*
> *=================*
> Tries to handle the tags based on the meta data from the trailer info.  All
> the apis like seekTo, next(), getKeyValue() are now able to handle tags
> based on the flag passed during the construction of the Readers and
> Writers.  We can be sure that for any instances of V2 the includeTags flag
> would always be false.
>
> *DataBlockEncoders*
> *==============*
> Additonal arguments added to the apis in the interfaces related to
> HFileDataBlockEncoders, BufferedDataBlockEncoders,
> HFileDataBlockEncodingContext etc.  Again for V2 the new apis would still
> behave the same way and there would be no impact for V2 based usecases.
> The BuffereddataEncoder, being the base class for all encoders other than
> PrefixTree would now be tag aware.
>
> *PrefixTreeEncoders*
> *==============*
> Trying to keep changes minimal here but would ensure that there are no
> behaviourial changes while using PrefixTree with V2.
>
> *KeyValue class*
> *===========*
> Wil include changes to have a Tag class inside this.  Apis to identify tags
> in a KV would be needed.  Util method changes also would be there.
>
> For V2 based read/write flow the existing code path applies with no/minimal
> changes.
>
> Many testcases has to be changed to accomodate the api changes happening to
> the internal interfaces.
> I have listed down the changes at a high level, may be once you could see a
> patch that would give more clarity. Let me know if further information
> would be needed.
>
> Regards
> Ram
>
>
> On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jxiang@cloudera.com> wrote:
>
> > Can you share some more details about it?  A graph/chart/table showing
> the
> > specific difference will be helpful.
> >
> > Thanks,
> > Jimmy
> >
> >
> > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > I have been following comments on HBASE-8496.
> > >
> > > I think introducing cell tagging through HFile v3 is acceptable.
> > >
> > > Looking forward to seeing your implementation.
> > >
> > > Cheers
> > >
> > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > For the past couple of months, we have been working through various
> > > > prototypes for supporting inline storage of tags in cells as
> persisted
> > on
> > > > disk. Our goals are to support optional use of tags with minimal
> > changes
> > > to
> > > > core code while also avoiding performance impacts to users who do not
> > use
> > > > tags.
> > > >
> > > >  For background, refer to the comments in
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > > >
> > > > and
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > >
> > > >  We have iterated on a couple of prototypes that implement tag
> > awareness
> > > in
> > > > DataBlockEncoders, later as a new type of Codec for Cells. This point
> > is
> > > > discussed in the above comments in HBASE-8496.
> > > >
> > > > We think that tag awareness in Cell Codecs is the right way, but
> there
> > > are
> > > > some shortcomings with the current interfaces internal to HFile that
> > need
> > > > to addressed in order to avoid any performance impacts for those who
> do
> > > not
> > > > want to use inline tags, and that may involve a drastic amount of
> code
> > > > change.
> > > >
> > > >  We can avoid several problems with HFile V2 internals, and backwards
> > > > compatibility concerns, and allow for working tags support with no
> > > > performance impact and low risk to all HBase users who do not want
> tag
> > > > support, while still allowing for inline tags capabilities in a
> > shipping
> > > > version of HBase, by introducing this in a new V3 version for HFile.
> > > >
> > > >  The new V3 version for HFile differs from earlier versions by
> > supporting
> > > > inline tag storage.  This version does not change the HFileBlock
> format
> > > > whereas it just serializes and deserializes the Tag information that
> > > would
> > > > be persisted in the HFile. Having HFile V3 would also help to keep
> Tags
> > > > optional such that the existing cases where there are no tags are
> > totally
> > > > unaffected.  Also we ensure that we keep the changes outside of the
> V3
> > > > reader and writer minimal.  Compatibility would not be a problem with
> > > > future versions when we go with Cell Codecs.  What Codecs used for
> > > writing
> > > > the file will be persisted in the HFile header.  Now for files that
> are
> > > > either V2 or V3 we will instantiate two default codecs that know to
> > deal
> > > > with serializations with and without tags.
> > > >
> > > >  There have been thoughts on an HFile V3 prior, e.g.:
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > >
> > > >  We have been working on this and will have a clean patch with good
> > > amount
> > > > of testing in time for 0.96.
> > > >
> > > > Although our focus is on performance-neutral persistence of inline
> cell
> > > > tags in 0.96 to enable a couple of security coprocessor users,
> > > introducing
> > > > an HFile V3 provides design freedom for some other features and
> > problems
> > > > too that can be developed through the 0.96 cycle into 0.98.
> > > >
> > > > Pls voice your opinion on this so that we can make this clear and may
> > be
> > > > define the scope of the patch.  Also feel free to comment on
> HBASE-8496
> > > on
> > > > your thoughts and ideas.
> > > >
> > > > Regards
> > > >
> > > > Ram
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message