hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjeet Singh <manjeet.chand...@gmail.com>
Subject Re: Storing XML file in Hbase
Date Mon, 16 Jan 2017 07:27:51 GMT
I have same question but its related to storing vedio files more than 10 MB.

Questions are:
(1) how can I increase 10 MOB size?
(2) what is the performance impact does any buddy have stats?
(3) what is the best recommended size of MOB?
(4)      HColumnDescriptor hc = new HColumnDescriptor(“f”);
     hc.setMobEnabled(true);  // what is the setter method in hbase client
1.2.1
     hc.setMobThreshold(102400L);//what is the setter method in hbase
client 1.2.1


Thanks
Manjeet

On Tue, Nov 29, 2016 at 4:52 AM, Richard Startin <richardstartin@outlook.com
> wrote:

> In my experience it's better to keep the number of column families low.
> When flushes occur, they effect all column families in a table, so when the
> memstore fills you'll create an HFile per family. I haven't seen any
> performance impact in having two column families though.
>
>
> As for the number of columns, there are two extremes - 1) "narrow" - store
> the xml as a blob in a single cell; 2) "wide" break it out into columns, of
> which you can have thousands.
>
>
>   1.  In the case where you store XML as a blob you always need to
> retrieve the entire document, and must deserialise it to perform
> operations. You save space in not repeating the row key, you save space on
> column and column family qualifiers
>   2.  When you break the XML out into columns you can retrieve data at a
> per attribute level, which might save IO by filtering unnecessary content,
> and you don't need to break open the XML to perform operations. You incur a
> cost in repeating the row key per tuple (this can add up and will effect
> read performance by limiting the number of rows that can fit into the block
> cache), as well as the extra cost of column families. There is a practical
> limit to the number of columns because a row cannot be split across regions.
>
> You may find optimal performance for you use case somewhere between the
> two extremes and it's best to prototype and measure early.
>
> Cheers,
> Richard
>
>
> https://richardstartin.com/
>
>
> ________________________________
> From: Mich Talebzadeh <mich.talebzadeh@gmail.com>
> Sent: 28 November 2016 21:57
> To: user@hbase.apache.org
> Subject: Re: Storing XML file in Hbase
>
> Thanks Richard.
>
> How would one decide on the number of column family and columns?
>
> Is there a ballpark approach
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 November 2016 at 16:04, Richard Startin <richardstartin@outlook.com>
> wrote:
>
> > Hi Mich,
> >
> > If you want to store the file whole, you'll need to enforce a 10MB limit
> > to the file size, otherwise you will flush too often (each time the me
> > store fills up) which will slow down writes.
> >
> > Maybe you could deconstruct the xml by extracting columns from the xml
> > using xpath?
> >
> > If the files are small there might be a tangible performance benefit by
> > limiting the number of columns.
> >
> > Cheers,
> > Richard
> >
> > Sent from my iPhone
> >
> > > On 28 Nov 2016, at 15:53, Dima Spivak <dimaspivak@apache.org> wrote:
> > >
> > > Hi Mich,
> > >
> > > How many files are you looking to store? How often do you need to read
> > > them? What's the total size of all the files you need to serve?
> > >
> > > Cheers,
> > > Dima
> > >
> > > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
> > mich.talebzadeh@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> Storing XML file in Big Data. Are there any strategies to create
> > multiple
> > >> column families or just one column family and in that case how many
> > columns
> > >> would be optional?
> > >>
> > >> thanks
> > >>
> > >> Dr Mich Talebzadeh
> > >>
> > >>
> > >>
> > >> LinkedIn *
> > >> https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw
> > >> <
> > >> https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > OABUrV8Pw
> > >>> *
> > >>
> > >>
> > >>
> > >> http://talebzadehmich.wordpress.com
> > >>
> > >>
> > >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > >> loss, damage or destruction of data or any other property which may
> > arise
> > >> from relying on this email's technical content is explicitly
> disclaimed.
> > >> The author will in no case be liable for any monetary damages arising
> > from
> > >> such loss, damage or destruction.
> > >>
> >
>



-- 
luv all

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message