hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <ser...@hortonworks.com>
Subject Re: reason to do major compaction after split
Date Fri, 08 Mar 2013 19:32:28 GMT
+1.
That gives us a lot of freedom to do stuff in many scenarios.

On Thu, Mar 7, 2013 at 5:42 PM, Andrew Purtell <apurtell@apache.org> wrote:

> > also, if instead of files you think about handling blocks directly you
> can end up doing more stuff, like a proper compaction that require less I/O
> if N blocks are not changed, some crazy deduplication on tables with same
> content & similar...
>
> Sounds like a step toward using a block pool directly and avoiding the
> filesystem layer (Hadoop 2+).
>
>
> On Fri, Mar 8, 2013 at 7:36 AM, Matteo Bertozzi <theo.bertozzi@gmail.com
> >wrote:
>
> > sure having the hardlink support
> > (HDFS-3370<https://issues.apache.org/jira/browse/HDFS-3370>)
> > solve the HFileLink hack
> > but you still need to add extra metadata for splits (reference files)
> >
> > also, if instead of files you think about handling blocks directly
> > you can end up doing more stuff, like a proper compaction that
> > require less I/O if N blocks are not changed, some crazy deduplication
> > on tables with same content & similar...
> >
> > On Thu, Mar 7, 2013 at 11:22 PM, Sergey Shelukhin <
> sergey@hortonworks.com
> > >wrote:
> >
> > > Hmm... ranges sounds good, but for files, it would be nice if there
> were
> > a
> > > hardlink mechanism.
> > > It should be trivial to do in HDFS if blocks could belong to several
> > files.
> > > Then we don't have to have private cleanup code.
> > >
> > > On Thu, Mar 7, 2013 at 2:28 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com
> > > >wrote:
> > >
> > > > This is seems to going in a super messy direction.
> > > > With HBASE-7806 the ideas was to cleanup all this crazy stuff
> > (HFileLink,
> > > > References, ...)
> > > >
> > > > unfortunately the initial decision of tight together the fs layout
> > > > and the tables/regions/families is bringing to all this workaround to
> > > have
> > > > something cool.
> > > >
> > > > If you put the files in one place, and the association in another
>  you
> > > can
> > > > avoid all this complexity.
> > > >
> > > > /hbase/data/[file1, file 2, file 3, file N]
> > > >
> > > > table 1/region 1: [file 2]
> > > > table 1/region 2: [file 1 (from 0 to 50)]
> > > > table 1/region 3: [file 1 (from 50 to 100)]
> > > > table 2/region 1: [file 1, file 2]
> > > >
> > > > On Thu, Mar 7, 2013 at 10:13 PM, Stack <stack@duboce.net> wrote:
> > > >
> > > > > Yes.  That is a few trips to the NN listing directory contents and
> > then
> > > > > some edits/reading of .META.  We would have to introduce a
> > QuarterHFile
> > > > to
> > > > > go with our HalfHFile (or rename HalfHFile as PieceO'HFile).
> > > > >
> > > > >
> > > > > St.Ack
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message