incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Contrast of Blur to ElasticSearch, Solr
Date Mon, 09 Dec 2013 03:32:59 GMT
Thanks for the info about other distributed FSs being an option.  I'd guess
relying on the distributed FS is nice for any very large deployment, but I
wonder if that requirement is hinderance for any small to medium sized
deployment that needs more than 1 shard server, but doesn't quite want the
whole dist FS machinery.

What's your experience?

Distributed trace sounds nice and useful!  Is it exposed via JMX or some
other API?  I'd want us to capture that with SPM once we add support for
Blur monitoring to SPM.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Sun, Dec 8, 2013 at 10:10 AM, Aaron McCurry <amccurry@gmail.com> wrote:

> On Sun, Dec 8, 2013 at 9:57 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com
> > wrote:
>
> > Thanks Aaron for this info.  This sounds very similar to both
> Solr/ES.....
> > from this description I can't really see any significant difference.
> >  Perhaps the main difference is that with Solr/ES Hadoop/HDFS/MapReduce
> is
> > something that's optional and that most people do not (need to) use,
> while
> > Hadoop/HDFS/MapReduce are an integral part of Blur's offering and you
> can't
> > have Blur without them.
> >
>
> While I haven't ever run Blur without HDFS.  Technically you could run any
> distributed file system with Blur, but a distributed FS is required if you
> want to go beyond 1 shard server.
>
> MapReduce is not required, only a distributed FS and ZooKeeper.
>
>
> >
> > What is distributed tracing?  I can't map that to anything in Solr/ES.
> >
>
> It allows the client to start a trace of the request(s) they make.  It
> propagates through the entire stack gathering timing around all the
> traceable sections of code.  It also traverses threads and network calls.
>  It helps to explain where the time goes for a given request.  There is
> also a display for the trace built into the status pages of Blur.
>
> Aaron
>
>
> >
> > Thanks,
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> > On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> >
> > > Hi James,
> > >
> > > Thanks for your interest and questions, I will attempt to answer your
> > > questions below.
> > >
> > >
> > > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <jkebinger@gmail.com>
> > > wrote:
> > >
> > > > Hi Aaron, I'm wondering if you can talk a little about how you Blur
> > > > differentiating itself from ElasticSearch and Solr. It seems like
> both
> > of
> > > > them, in particular Solr after picking up some Blur code, are gaining
> > > more
> > > > abilities to interact with hadoop and HDFS.
> > > >
> > >
> > > Unfortunately I'm not an expert in Solr or ElasticSearch.  I tell you
> > that
> > > Blur's high level features when talking about how it's interacts with
> > > Hadoop.
> > >
> > > - Index storage (The obvious one)
> > > - Bulk offline indexing, with incremental updates.
> > > This one gives you the ability to perform indexing on a dedicated
> > MapReduce
> > > cluster and simply move the index updates to the running Blur cluster
> for
> > > importing.
> > > - WAL (write ahead log) is written to use HDFS
> > > - Also we are currently moving most of the meta data from ZooKeeper
> > storage
> > > to HDFS storage.  This makes interacting with the meta data of a table
> > easy
> > > to do form within MapReduce jobs
> > >
> > >
> > >
> > > > How does a blur install differ from a solr setup reading off hdfs?
> > > >
> > >
> > > Again I'm not an expert in Solr.  Blur's setup runs a cluster of shard
> > > servers that serve shards (indexes) of the table within that shard
> > cluster.
> > >  The indexes are stored once in HDFS (not counting the HDFS replication
> > > here) and evenly distributed across whatever shard servers are online.
> > >  Blur utilizes a BlockCache (think file system cache) that is an
> off-heap
> > > based system.  The first version of this was originally picked up by
> > > Cloudera and modified (I'm assuming) and committed back into the
> > > Lucene/Solr code base.  The second version of this block cache (Blur
> > 0.2.2
> > > stable) is now the default in Blur.  It has several advantages of the
> > first
> > > version:
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==GMhQry4aXPjug@mail.gmail.com%3E
> > >
> > > One interesting feature of Blur is the ability to run a cluster of
> > > controllers (controllers are used to make the shard cluster look like a
> > > single service) in front multiple shard clusters.  This can help to
> deal
> > > with reindexes of data, meaning that you can reindex all your index to
> a
> > > new cluster and not effect performance of the cluster that your users
> may
> > > be interacting with.
> > >
> > >
> > > Some of the overall features of Blur are:
> > > - NRT updates of data
> > > - Offline bulk indexing
> > > - Block cache for fast query performance
> > > - Index warmup (pulls parts of the index up into block cache when a
> > segment
> > > is brought online)
> > > - Performance metrics gathering
> > > - Distributed tracing
> > > - Custom index types
> > > - Custom server side logic can be implemented (basic)
> > >
> > > I'm sure there are many more.
> > >
> > > Hope this helps.
> > >
> > > Aaron
> > >
> > >
> > >
> > > >
> > > > thanks
> > > >
> > > > James
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message