incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naresh Yadav <nyadav....@gmail.com>
Subject Re: Contrast of Blur to ElasticSearch, Solr
Date Fri, 13 Dec 2013 12:35:10 GMT
Hi,

I am little new to these technologies, may not be at right stage to answer
these questions but i had read a lot for comparing
these technologies and i figured out this comparison table based on initial
understanding :



*BLUR *









*ElasticSearch*

Supports Lucene over HDFS









Yes









Yes

Dynamic Columns Indexing









Yes









Yes

Internally uses MapReduce to store/update index









Yes









No

Index Storage many options









Only FileSystem/HDFS









Many Options

In memory Indexing









No









Yes

HDFS lacks page cache so build own









Yes have concept of BlockCache









No

WriteAhead Log for Indexes









Yes









No

I may be wrong in understanding of few of these as i had just read about
these, not actually used them in real problem.
About Solr this has used BLUR code for integration with HDFS and do not
support MapReduce to store/update indexes.

Thanks
Naresh

On Tue, Dec 10, 2013 at 1:45 AM, Aaron McCurry <amccurry@gmail.com> wrote:

> On Sun, Dec 8, 2013 at 10:32 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > Thanks for the info about other distributed FSs being an option.  I'd
> guess
> > relying on the distributed FS is nice for any very large deployment, but
> I
> > wonder if that requirement is hinderance for any small to medium sized
> > deployment that needs more than 1 shard server, but doesn't quite want
> the
> > whole dist FS machinery.
> >
> > What's your experience?
> >
>
> I don't see running the HDFS part of Hadoop very hard to do, MapReduce
> might be overkill for some people though.
>
>
> >
> > Distributed trace sounds nice and useful!  Is it exposed via JMX or some
> > other API?  I'd want us to capture that with SPM once we add support for
> > Blur monitoring to SPM.
> >
>
> All the trace information is available through the standard Thrift API in
> Blur.  And there's a pluggable API for how the traces are stored, current
> implementations are in ZooKeeper and HDFS, as well as just logging the
> info.
>
> Aaron
>
>
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Sun, Dec 8, 2013 at 10:10 AM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> >
> > > On Sun, Dec 8, 2013 at 9:57 AM, Otis Gospodnetic <
> > > otis.gospodnetic@gmail.com
> > > > wrote:
> > >
> > > > Thanks Aaron for this info.  This sounds very similar to both
> > > Solr/ES.....
> > > > from this description I can't really see any significant difference.
> > > >  Perhaps the main difference is that with Solr/ES
> Hadoop/HDFS/MapReduce
> > > is
> > > > something that's optional and that most people do not (need to) use,
> > > while
> > > > Hadoop/HDFS/MapReduce are an integral part of Blur's offering and you
> > > can't
> > > > have Blur without them.
> > > >
> > >
> > > While I haven't ever run Blur without HDFS.  Technically you could run
> > any
> > > distributed file system with Blur, but a distributed FS is required if
> > you
> > > want to go beyond 1 shard server.
> > >
> > > MapReduce is not required, only a distributed FS and ZooKeeper.
> > >
> > >
> > > >
> > > > What is distributed tracing?  I can't map that to anything in
> Solr/ES.
> > > >
> > >
> > > It allows the client to start a trace of the request(s) they make.  It
> > > propagates through the entire stack gathering timing around all the
> > > traceable sections of code.  It also traverses threads and network
> calls.
> > >  It helps to explain where the time goes for a given request.  There is
> > > also a display for the trace built into the status pages of Blur.
> > >
> > > Aaron
> > >
> > >
> > > >
> > > > Thanks,
> > > > Otis
> > > > --
> > > > Performance Monitoring * Log Analytics * Search Analytics
> > > > Solr & Elasticsearch Support * http://sematext.com/
> > > >
> > > >
> > > >
> > > > On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <amccurry@gmail.com>
> > > wrote:
> > > >
> > > > > Hi James,
> > > > >
> > > > > Thanks for your interest and questions, I will attempt to answer
> your
> > > > > questions below.
> > > > >
> > > > >
> > > > > On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <
> jkebinger@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Aaron, I'm wondering if you can talk a little about how you
> Blur
> > > > > > differentiating itself from ElasticSearch and Solr. It seems
like
> > > both
> > > > of
> > > > > > them, in particular Solr after picking up some Blur code, are
> > gaining
> > > > > more
> > > > > > abilities to interact with hadoop and HDFS.
> > > > > >
> > > > >
> > > > > Unfortunately I'm not an expert in Solr or ElasticSearch.  I tell
> you
> > > > that
> > > > > Blur's high level features when talking about how it's interacts
> with
> > > > > Hadoop.
> > > > >
> > > > > - Index storage (The obvious one)
> > > > > - Bulk offline indexing, with incremental updates.
> > > > > This one gives you the ability to perform indexing on a dedicated
> > > > MapReduce
> > > > > cluster and simply move the index updates to the running Blur
> cluster
> > > for
> > > > > importing.
> > > > > - WAL (write ahead log) is written to use HDFS
> > > > > - Also we are currently moving most of the meta data from ZooKeeper
> > > > storage
> > > > > to HDFS storage.  This makes interacting with the meta data of a
> > table
> > > > easy
> > > > > to do form within MapReduce jobs
> > > > >
> > > > >
> > > > >
> > > > > > How does a blur install differ from a solr setup reading off
> hdfs?
> > > > > >
> > > > >
> > > > > Again I'm not an expert in Solr.  Blur's setup runs a cluster of
> > shard
> > > > > servers that serve shards (indexes) of the table within that shard
> > > > cluster.
> > > > >  The indexes are stored once in HDFS (not counting the HDFS
> > replication
> > > > > here) and evenly distributed across whatever shard servers are
> > online.
> > > > >  Blur utilizes a BlockCache (think file system cache) that is an
> > > off-heap
> > > > > based system.  The first version of this was originally picked up
> by
> > > > > Cloudera and modified (I'm assuming) and committed back into the
> > > > > Lucene/Solr code base.  The second version of this block cache
> (Blur
> > > > 0.2.2
> > > > > stable) is now the default in Blur.  It has several advantages of
> the
> > > > first
> > > > > version:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201310.mbox/%3CCAB6tTr0Nr2aDLc4kkHoeqiO-utwzBAhb=Ru==GMhQry4aXPjug@mail.gmail.com%3E
> > > > >
> > > > > One interesting feature of Blur is the ability to run a cluster of
> > > > > controllers (controllers are used to make the shard cluster look
> > like a
> > > > > single service) in front multiple shard clusters.  This can help
to
> > > deal
> > > > > with reindexes of data, meaning that you can reindex all your index
> > to
> > > a
> > > > > new cluster and not effect performance of the cluster that your
> users
> > > may
> > > > > be interacting with.
> > > > >
> > > > >
> > > > > Some of the overall features of Blur are:
> > > > > - NRT updates of data
> > > > > - Offline bulk indexing
> > > > > - Block cache for fast query performance
> > > > > - Index warmup (pulls parts of the index up into block cache when
a
> > > > segment
> > > > > is brought online)
> > > > > - Performance metrics gathering
> > > > > - Distributed tracing
> > > > > - Custom index types
> > > > > - Custom server side logic can be implemented (basic)
> > > > >
> > > > > I'm sure there are many more.
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > Aaron
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > thanks
> > > > > >
> > > > > > James
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message