hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Disk space usage of HFilev1 vs HFilev2
Date Mon, 27 Aug 2012 20:19:04 GMT
Hi Guys,

I was digging through the hbase-default.xml file and i found this property
relates HFile handling:
</property>
    <property>
      <name>hfile.format.version</name>
      <value>2</value>
      <description>
          The HFile format version to use for new files. Set this to 1 to
test
          backwards-compatibility. The default value of this option should
be
          consistent with FixedFileTrailer.MAX_VERSION.
      </description>
  </property>

I believe setting this to 1 would help me carry out my test. Now we know
how to store data in HFileV1 in HBase0.92 :) . I'll post the result once i
try this out.

Thanks,
Anil


On Wed, Aug 15, 2012 at 5:09 AM, J Mohamed Zahoor <jmozah@gmail.com> wrote:

> Cool. Now we have something on the records :-)
>
> ./Zahoor@iPad
>
> On 15-Aug-2012, at 3:12 AM, Harsh J <harsh@cloudera.com> wrote:
>
> > Not wanting to have this thread too end up as a mystery-result on the
> > web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> > into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> > them (waited for completion and drop in IO write activity) and then
> > measured them to find this:
> >
> > 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> > 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
> >
> > So… not much of a difference. It is still your data that counts. I
> > believe what Anil may have had were merely additional, un-compacted
> > stores?
> >
> > P.s. Note that my 'test' table were all defaults. That is, merely
> > "create 'test', 'col1'", nothing else, so the block indexes must've
> > probably gotten created for every row, as thats at 64k by default,
> > while my rows are all 100k each.
> >
> > On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <anilgupta84@gmail.com>
> wrote:
> >> Hi Kevin,
> >>
> >> If it's not possible to store table in HFilev1 in HBase 0.92 then my
> last
> >> option will be to do store data on pseudo-distributed or standalone
> cluster
> >> for the comparison.
> >> The advantage with the current installation is that its a fully
> distributed
> >> cluster with around 33 million records in a table. So, it would give me
> a
> >> better estimate.
> >>
> >> Thanks,
> >> Anil Gupta
> >>
> >> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
> >>
> >>> Do you not have a pseudo cluster for testing anywhere?
> >>>
> >>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <anilgupta84@gmail.com>
> wrote:
> >>>
> >>>> Hi Jerry,
> >>>>
> >>>> I am wiling to do that but the problem is that i wiped off the
> HBase0.90
> >>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92? If
i
> >>> can
> >>>> store a file in HFilev1 in 0.92 then i can do the comparison.
> >>>>
> >>>> Thanks,
> >>>> Anil Gupta
> >>>>
> >>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <chilinglam@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Anil:
> >>>>>
> >>>>> Maybe you can try to compare the two HFile implementation directly?
> Let
> >>>> say
> >>>>> write 1000 rows into HFile v1 format and then into HFile v2 format.
> You
> >>>> can
> >>>>> then compare the size of the two directly?
> >>>>>
> >>>>> HTH,
> >>>>>
> >>>>> Jerry
> >>>>>
> >>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <anilgupta84@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> Hi Zahoor,
> >>>>>>
> >>>>>> Then it seems like i might have missed something when doing
hdfs
> >>> usage
> >>>>>> estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME
> >>> for
> >>>>>> getting the hdfs usage of a table. Is this the right way? Since
i
> >>> wiped
> >>>>> of
> >>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of
it. Is
> >>> it
> >>>>>> possible to store a table in HFileV1 instead of HFileV2 in
> HBase0.92?
> >>>>>> In this way i can do a fair comparison.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jmozah@gmail.com>
wrote:
> >>>>>>
> >>>>>>> Hi Anil,
> >>>>>>>
> >>>>>>> I really doubt that there is 50% drop in file sizes... As
far as i
> >>>>> know..
> >>>>>>> there is no drastic space conserving feature in V2. Just
as  an
> >>> after
> >>>>>>> thought.. do a major compact and check the sizes.
> >>>>>>>
> >>>>>>> ./Zahoor
> >>>>>>> http://blog.zahoor.in
> >>>>>>>
> >>>>>>>
> >>>>>>> On 15-Aug-2012, at 12:31 AM, anil gupta <anilgupta84@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> l
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Thanks & Regards,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Thanks & Regards,
> >>>> Anil Gupta
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Kevin O'Dell
> >>> Customer Operations Engineer, Cloudera
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Anil Gupta
> >
> >
> >
> > --
> > Harsh J
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message