incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Contrast of Blur to ElasticSearch, Solr
Date Sun, 08 Dec 2013 15:10:25 GMT
On Sun, Dec 8, 2013 at 9:57 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Thanks Aaron for this info.  This sounds very similar to both Solr/ES.....
> from this description I can't really see any significant difference.
>  Perhaps the main difference is that with Solr/ES Hadoop/HDFS/MapReduce is
> something that's optional and that most people do not (need to) use, while
> Hadoop/HDFS/MapReduce are an integral part of Blur's offering and you can't
> have Blur without them.
>

While I haven't ever run Blur without HDFS.  Technically you could run any
distributed file system with Blur, but a distributed FS is required if you
want to go beyond 1 shard server.

MapReduce is not required, only a distributed FS and ZooKeeper.


>
> What is distributed tracing?  I can't map that to anything in Solr/ES.
>

It allows the client to start a trace of the request(s) they make.  It
propagates through the entire stack gathering timing around all the
traceable sections of code.  It also traverses threads and network calls.
 It helps to explain where the time goes for a given request.  There is
also a display for the trace built into the status pages of Blur.

Aaron


>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > Hi James,
> >
> > Thanks for your interest and questions, I will attempt to answer your
> > questions below.
> >
> >
> > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <jkebinger@gmail.com>
> > wrote:
> >
> > > Hi Aaron, I'm wondering if you can talk a little about how you Blur
> > > differentiating itself from ElasticSearch and Solr. It seems like both
> of
> > > them, in particular Solr after picking up some Blur code, are gaining
> > more
> > > abilities to interact with hadoop and HDFS.
> > >
> >
> > Unfortunately I'm not an expert in Solr or ElasticSearch.  I tell you
> that
> > Blur's high level features when talking about how it's interacts with
> > Hadoop.
> >
> > - Index storage (The obvious one)
> > - Bulk offline indexing, with incremental updates.
> > This one gives you the ability to perform indexing on a dedicated
> MapReduce
> > cluster and simply move the index updates to the running Blur cluster for
> > importing.
> > - WAL (write ahead log) is written to use HDFS
> > - Also we are currently moving most of the meta data from ZooKeeper
> storage
> > to HDFS storage.  This makes interacting with the meta data of a table
> easy
> > to do form within MapReduce jobs
> >
> >
> >
> > > How does a blur install differ from a solr setup reading off hdfs?
> > >
> >
> > Again I'm not an expert in Solr.  Blur's setup runs a cluster of shard
> > servers that serve shards (indexes) of the table within that shard
> cluster.
> >  The indexes are stored once in HDFS (not counting the HDFS replication
> > here) and evenly distributed across whatever shard servers are online.
> >  Blur utilizes a BlockCache (think file system cache) that is an off-heap
> > based system.  The first version of this was originally picked up by
> > Cloudera and modified (I'm assuming) and committed back into the
> > Lucene/Solr code base.  The second version of this block cache (Blur
> 0.2.2
> > stable) is now the default in Blur.  It has several advantages of the
> first
> > version:
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==GMhQry4aXPjug@mail.gmail.com%3E
> >
> > One interesting feature of Blur is the ability to run a cluster of
> > controllers (controllers are used to make the shard cluster look like a
> > single service) in front multiple shard clusters.  This can help to deal
> > with reindexes of data, meaning that you can reindex all your index to a
> > new cluster and not effect performance of the cluster that your users may
> > be interacting with.
> >
> >
> > Some of the overall features of Blur are:
> > - NRT updates of data
> > - Offline bulk indexing
> > - Block cache for fast query performance
> > - Index warmup (pulls parts of the index up into block cache when a
> segment
> > is brought online)
> > - Performance metrics gathering
> > - Distributed tracing
> > - Custom index types
> > - Custom server side logic can be implemented (basic)
> >
> > I'm sure there are many more.
> >
> > Hope this helps.
> >
> > Aaron
> >
> >
> >
> > >
> > > thanks
> > >
> > > James
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message