incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Contrast of Blur to ElasticSearch, Solr
Date Mon, 09 Dec 2013 20:15:41 GMT
On Sun, Dec 8, 2013 at 10:32 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Thanks for the info about other distributed FSs being an option.  I'd guess
> relying on the distributed FS is nice for any very large deployment, but I
> wonder if that requirement is hinderance for any small to medium sized
> deployment that needs more than 1 shard server, but doesn't quite want the
> whole dist FS machinery.
>
> What's your experience?
>

I don't see running the HDFS part of Hadoop very hard to do, MapReduce
might be overkill for some people though.


>
> Distributed trace sounds nice and useful!  Is it exposed via JMX or some
> other API?  I'd want us to capture that with SPM once we add support for
> Blur monitoring to SPM.
>

All the trace information is available through the standard Thrift API in
Blur.  And there's a pluggable API for how the traces are stored, current
implementations are in ZooKeeper and HDFS, as well as just logging the info.

Aaron


>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Sun, Dec 8, 2013 at 10:10 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > On Sun, Dec 8, 2013 at 9:57 AM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com
> > > wrote:
> >
> > > Thanks Aaron for this info.  This sounds very similar to both
> > Solr/ES.....
> > > from this description I can't really see any significant difference.
> > >  Perhaps the main difference is that with Solr/ES Hadoop/HDFS/MapReduce
> > is
> > > something that's optional and that most people do not (need to) use,
> > while
> > > Hadoop/HDFS/MapReduce are an integral part of Blur's offering and you
> > can't
> > > have Blur without them.
> > >
> >
> > While I haven't ever run Blur without HDFS.  Technically you could run
> any
> > distributed file system with Blur, but a distributed FS is required if
> you
> > want to go beyond 1 shard server.
> >
> > MapReduce is not required, only a distributed FS and ZooKeeper.
> >
> >
> > >
> > > What is distributed tracing?  I can't map that to anything in Solr/ES.
> > >
> >
> > It allows the client to start a trace of the request(s) they make.  It
> > propagates through the entire stack gathering timing around all the
> > traceable sections of code.  It also traverses threads and network calls.
> >  It helps to explain where the time goes for a given request.  There is
> > also a display for the trace built into the status pages of Blur.
> >
> > Aaron
> >
> >
> > >
> > > Thanks,
> > > Otis
> > > --
> > > Performance Monitoring * Log Analytics * Search Analytics
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <amccurry@gmail.com>
> > wrote:
> > >
> > > > Hi James,
> > > >
> > > > Thanks for your interest and questions, I will attempt to answer your
> > > > questions below.
> > > >
> > > >
> > > > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <jkebinger@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Aaron, I'm wondering if you can talk a little about how you Blur
> > > > > differentiating itself from ElasticSearch and Solr. It seems like
> > both
> > > of
> > > > > them, in particular Solr after picking up some Blur code, are
> gaining
> > > > more
> > > > > abilities to interact with hadoop and HDFS.
> > > > >
> > > >
> > > > Unfortunately I'm not an expert in Solr or ElasticSearch.  I tell you
> > > that
> > > > Blur's high level features when talking about how it's interacts with
> > > > Hadoop.
> > > >
> > > > - Index storage (The obvious one)
> > > > - Bulk offline indexing, with incremental updates.
> > > > This one gives you the ability to perform indexing on a dedicated
> > > MapReduce
> > > > cluster and simply move the index updates to the running Blur cluster
> > for
> > > > importing.
> > > > - WAL (write ahead log) is written to use HDFS
> > > > - Also we are currently moving most of the meta data from ZooKeeper
> > > storage
> > > > to HDFS storage.  This makes interacting with the meta data of a
> table
> > > easy
> > > > to do form within MapReduce jobs
> > > >
> > > >
> > > >
> > > > > How does a blur install differ from a solr setup reading off hdfs?
> > > > >
> > > >
> > > > Again I'm not an expert in Solr.  Blur's setup runs a cluster of
> shard
> > > > servers that serve shards (indexes) of the table within that shard
> > > cluster.
> > > >  The indexes are stored once in HDFS (not counting the HDFS
> replication
> > > > here) and evenly distributed across whatever shard servers are
> online.
> > > >  Blur utilizes a BlockCache (think file system cache) that is an
> > off-heap
> > > > based system.  The first version of this was originally picked up by
> > > > Cloudera and modified (I'm assuming) and committed back into the
> > > > Lucene/Solr code base.  The second version of this block cache (Blur
> > > 0.2.2
> > > > stable) is now the default in Blur.  It has several advantages of the
> > > first
> > > > version:
> > > >
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==GMhQry4aXPjug@mail.gmail.com%3E
> > > >
> > > > One interesting feature of Blur is the ability to run a cluster of
> > > > controllers (controllers are used to make the shard cluster look
> like a
> > > > single service) in front multiple shard clusters.  This can help to
> > deal
> > > > with reindexes of data, meaning that you can reindex all your index
> to
> > a
> > > > new cluster and not effect performance of the cluster that your users
> > may
> > > > be interacting with.
> > > >
> > > >
> > > > Some of the overall features of Blur are:
> > > > - NRT updates of data
> > > > - Offline bulk indexing
> > > > - Block cache for fast query performance
> > > > - Index warmup (pulls parts of the index up into block cache when a
> > > segment
> > > > is brought online)
> > > > - Performance metrics gathering
> > > > - Distributed tracing
> > > > - Custom index types
> > > > - Custom server side logic can be implemented (basic)
> > > >
> > > > I'm sure there are many more.
> > > >
> > > > Hope this helps.
> > > >
> > > > Aaron
> > > >
> > > >
> > > >
> > > > >
> > > > > thanks
> > > > >
> > > > > James
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message