hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: DISCUSS : HFile V3 proposal for tags in 0.96
Date Fri, 19 Jul 2013 04:57:46 GMT
>>Any consideration that the tags are serialized before the memstoreTS
instead of after ?
The argument is basically simple like memstoreTS is optional and that comes
only in HFile and not in KV.  The tags are as part of the current design
comes after Value in the KV structure.  Hence the same would be better to
be applied on HFiles also.
>>When would PrefixTree be able to handle tags ?
May be my stmt confused you.  Pls see the point on PrefixTreeEncoders in
the previous mail.  I meant that as per the current design PrefixKey,
DiffKey, FastDiff extend BufferedDataEncoders and hence
BufferedDataEncoders are made tag aware.

PrefixTreecodec has been handled separately to make it work with tags.
>> Put in another way, after this feature goes in, would
HFile V3 always be written ?
By default code will go with V2. So when user says he needs V3 he would
need to update the hfile.format.version to 3.  This would ensure that the
system uses V3.

Thanks Ted.

Regards
Ram


On Fri, Jul 19, 2013 at 10:10 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. V3 would now serailize the tags also after the Value part before the
> memstoreTS
>
> Any consideration that the tags are serialized before the memstoreTS
> instead of after ?
>
> bq. The BuffereddataEncoder, being the base class for all encoders other
> than PrefixTree would now be tag aware.
>
> When would PrefixTree be able to handle tags ?
>
> When a new HFile is opened, would user be able to specify that there is no
> tagging involved ? Put in another way, after this feature goes in, would
> HFile V3 always be written ?
>
> Thanks
>
> On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > What changes/differences that we would be introducing in the V3 format
> > would be (I will put down in words under subcategory)
> >
> > To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from
> > ReaderV2 and WriterV2 respectively.
> > *HFileBlockFormat*
> > *=============*
> > No change in V2 and V3.
> >
> > *KV serialization*
> > *============*
> > V2 no change
> > V3 would now serailize the tags also after the Value part before the
> > memstoreTS
> >
> > *FixedFileTrailer*
> > *===========*
> > Introduces a new information into the trailer which can be used in V3 to
> > make tags optional.  Suppose take the case that user selects V3 but in
> one
> > CF there are no tags.  Then we would write the tag bytes while flushing
> but
> > during compaction using this header info we would just avoid writing tags
> > in the compacted files.  This would mean no impact on read performances
> > after the compaction has been completed.
> > V2 would code also tries to get this trailer info but this being null no
> > impact on any of the existing code.
> >
> > *WriterV3 and ReaderV3*
> > *=================*
> > Tries to handle the tags based on the meta data from the trailer info.
>  All
> > the apis like seekTo, next(), getKeyValue() are now able to handle tags
> > based on the flag passed during the construction of the Readers and
> > Writers.  We can be sure that for any instances of V2 the includeTags
> flag
> > would always be false.
> >
> > *DataBlockEncoders*
> > *==============*
> > Additonal arguments added to the apis in the interfaces related to
> > HFileDataBlockEncoders, BufferedDataBlockEncoders,
> > HFileDataBlockEncodingContext etc.  Again for V2 the new apis would still
> > behave the same way and there would be no impact for V2 based usecases.
> > The BuffereddataEncoder, being the base class for all encoders other than
> > PrefixTree would now be tag aware.
> >
> > *PrefixTreeEncoders*
> > *==============*
> > Trying to keep changes minimal here but would ensure that there are no
> > behaviourial changes while using PrefixTree with V2.
> >
> > *KeyValue class*
> > *===========*
> > Wil include changes to have a Tag class inside this.  Apis to identify
> tags
> > in a KV would be needed.  Util method changes also would be there.
> >
> > For V2 based read/write flow the existing code path applies with
> no/minimal
> > changes.
> >
> > Many testcases has to be changed to accomodate the api changes happening
> to
> > the internal interfaces.
> > I have listed down the changes at a high level, may be once you could
> see a
> > patch that would give more clarity. Let me know if further information
> > would be needed.
> >
> > Regards
> > Ram
> >
> >
> > On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jxiang@cloudera.com>
> wrote:
> >
> > > Can you share some more details about it?  A graph/chart/table showing
> > the
> > > specific difference will be helpful.
> > >
> > > Thanks,
> > > Jimmy
> > >
> > >
> > > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > I have been following comments on HBASE-8496.
> > > >
> > > > I think introducing cell tagging through HFile v3 is acceptable.
> > > >
> > > > Looking forward to seeing your implementation.
> > > >
> > > > Cheers
> > > >
> > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >
> > > > > For the past couple of months, we have been working through various
> > > > > prototypes for supporting inline storage of tags in cells as
> > persisted
> > > on
> > > > > disk. Our goals are to support optional use of tags with minimal
> > > changes
> > > > to
> > > > > core code while also avoiding performance impacts to users who do
> not
> > > use
> > > > > tags.
> > > > >
> > > > >  For background, refer to the comments in
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
> > > > >
> > > > > and
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > >
> > > > >  We have iterated on a couple of prototypes that implement tag
> > > awareness
> > > > in
> > > > > DataBlockEncoders, later as a new type of Codec for Cells. This
> point
> > > is
> > > > > discussed in the above comments in HBASE-8496.
> > > > >
> > > > > We think that tag awareness in Cell Codecs is the right way, but
> > there
> > > > are
> > > > > some shortcomings with the current interfaces internal to HFile
> that
> > > need
> > > > > to addressed in order to avoid any performance impacts for those
> who
> > do
> > > > not
> > > > > want to use inline tags, and that may involve a drastic amount of
> > code
> > > > > change.
> > > > >
> > > > >  We can avoid several problems with HFile V2 internals, and
> backwards
> > > > > compatibility concerns, and allow for working tags support with no
> > > > > performance impact and low risk to all HBase users who do not want
> > tag
> > > > > support, while still allowing for inline tags capabilities in a
> > > shipping
> > > > > version of HBase, by introducing this in a new V3 version for
> HFile.
> > > > >
> > > > >  The new V3 version for HFile differs from earlier versions by
> > > supporting
> > > > > inline tag storage.  This version does not change the HFileBlock
> > format
> > > > > whereas it just serializes and deserializes the Tag information
> that
> > > > would
> > > > > be persisted in the HFile. Having HFile V3 would also help to keep
> > Tags
> > > > > optional such that the existing cases where there are no tags are
> > > totally
> > > > > unaffected.  Also we ensure that we keep the changes outside of the
> > V3
> > > > > reader and writer minimal.  Compatibility would not be a problem
> with
> > > > > future versions when we go with Cell Codecs.  What Codecs used for
> > > > writing
> > > > > the file will be persisted in the HFile header.  Now for files that
> > are
> > > > > either V2 or V3 we will instantiate two default codecs that know
to
> > > deal
> > > > > with serializations with and without tags.
> > > > >
> > > > >  There have been thoughts on an HFile V3 prior, e.g.:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
> > > > >
> > > > >  We have been working on this and will have a clean patch with good
> > > > amount
> > > > > of testing in time for 0.96.
> > > > >
> > > > > Although our focus is on performance-neutral persistence of inline
> > cell
> > > > > tags in 0.96 to enable a couple of security coprocessor users,
> > > > introducing
> > > > > an HFile V3 provides design freedom for some other features and
> > > problems
> > > > > too that can be developed through the 0.96 cycle into 0.98.
> > > > >
> > > > > Pls voice your opinion on this so that we can make this clear and
> may
> > > be
> > > > > define the scope of the patch.  Also feel free to comment on
> > HBASE-8496
> > > > on
> > > > > your thoughts and ideas.
> > > > >
> > > > > Regards
> > > > >
> > > > > Ram
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message