hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ayache Khettar <ayache.khet...@googlemail.com>
Subject Re: Storing Json format in Hbase
Date Sun, 04 Jan 2015 15:19:14 GMT
Hi

HBase doesn't support XML query using xpath. For that you will have to
consider an XML database such as exist (
http://exist-db.org/exist/apps/homepage/index.html) or MarkLogic (requires
commercial licence). If you still want to use Hbase then consider storing
metadata along the XML Payload with same row ID. You will have to think of
your queries first before making decision on how metadata you would want to
store. In one of the project I was involved in, we stored metadata data in
Apache solar using Hbase indexer (see cloudera product suite
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-User-Guide/csug_use_hbase_indexer_service.html)
which is near real time update. So the payload xml ends up in hbase and the
metadata goes into apache solar. So you query against apache solar as
opposed to Hbase.

There are various ways on how to achieve what you wanted and all down to
the choice of the technology and architecture drivers.

all the best

Ayache




On 4 January 2015 at 11:47, Shashidhar Rao <raoshashidhar123@gmail.com>
wrote:

> Ayache and Chandrashekhar,
>
> You are correct, even I am reluctant to go for json transformation. Storing
> xml in Hbase without  transformation to json would be a lot easier at the
> storing stage.
>
> But, my concern is querying this xml data from HBase. Queries include
> aggregation, count and joins just to name a few. Can you please shed some
> lights on how to query xml data from Hbase , is it possible to use xquery
> or xpath?
>
> Json transformation was considered because of Mongodb, as it supports
> native json format and  it seems to be good in analytics. Analytics would
> be at later stage.
>
> Can you please share some insights into xml querying from Hbase ,any links
> would be helpful or any example , I am unable to find.
>
> Thanks in advance
>
> Shashi
>
> On Sun, Jan 4, 2015 at 4:18 PM, Ayache Khettar <
> ayache.khettar@googlemail.com> wrote:
>
> > Hi
> >
> > You could perfectly store XML into Hbase without any issue. All depends
> > what do with the XML. To query back the XML, you will have to store its
> >  metadata with it using the same row ID. This way you could query back
> the
> > XML. I would go for JSON transformation only if the down stream flow
> needs
> > the payload in JSON format.
> >
> > Ayache
> > On 4 January 2015 at 10:07, Chandrashekhar Kotekar <
> > shekhar.kotekar@gmail.com> wrote:
> >
> > > You can convert xml to json using map-reduce program and then store
> json
> > > into HBase but you need to decide what should be your row key.
> > >
> > > Another point you have to take into account is that if you want to
> search
> > > anything inside json or not. If you want to search inside json then
> HBase
> > > won't be best option for you. Probably you can switch to MongoDB or
> some
> > > other document store.
> > >
> > > Hope it helps...
> > >
> > > Regards,
> > > Chandrashekhar
> > > On 04-Jan-2015 3:32 PM, "Shashidhar Rao" <raoshashidhar123@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Can someone guide me if the solution I am proposing is a feasible
> > option
> > > or
> > > > not
> > > >
> > > > 1. Large xml data is delivered through external system.
> > > > 2. Convert these into json format.
> > > > 3. Store it into HBASE ,even though there will be hardly any updates
> ,
> > > only
> > > > retrieval. I have looked at Hive but finally had to decide against it
> > as
> > > > retrieval would be slow.
> > > > 4. Need to use Hadoop Nosql as other components are all using Hadoop
> > > > ecosystem.
> > > >
> > > > Can xml data be directly stored into Hbase without any
> > > > transformation.(second question)
> > > >
> > > > Any suggestions on storing xml data on Nosql. (only open source and
> no
> > > > commercial nosql)
> > > >
> > > > Thanks in advance
> > > >
> > > > Shashi
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message