incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: How do I present Blur
Date Fri, 06 Jan 2017 10:39:36 GMT
First off both Elasticsearch and Solr are healthy strong projects with very
good products and support.  They both have great APIs and tons of features
that perform very well in most use cases.

Blur was born before Solr integrated with Hadoop (and before they merged
with the Lucene project) and Solr actually took Blur's first version of the
block cache.  If you are not familiar with the block cache it's basically a
OS file system cache replacement to make accessing HDFS perform well enough
to function as an index storage system without the need of copying all the
indexes locally.  Blur has been using a second version of the block cache
since then but I don't believe that Solr has ever updated it.  I haven't
kept up with the Solr project so they may have moved on from the original
block cache as well.

The two primary goals of Blur that really define the implement were:

- Quick search response

There are a few features in the query/read side of Blur but not as many as
in ES or Solr.  One of the reasons for this is that Blur only implemented
features that would work at any index size the system could handle (or at
least that was the goal).  Some of the latest additions to the read side of
Blur were the Commands API that allows developers to create there own
server side functions where they could access the Lucene indexes directly.
Commands were used to perform exports of the data in the index, create
facets that always give you a proper count (not just the top N like in
Solr) or anything else you could come up with to execute against a Lucene
index.  Basically they could be used to create new features without the
need of a new Thrift call and supporting API changes.

There are many other features like document level access control, query
cancellation (another feature that Solr adopted), etc.

- Massive data ingestion

Basically the focus on ingestion was not on latency but rather having the
ability to incrementally add large amounts of data to the index that is
likely also very large on it's own.  The project uses Yarn MR for this and
it is not a quick way to bring data but if your needs are to index large
chunks of data incrementally it works very well.  Also if a full reindex
was needed this could done easily as well.  Something to point out here is
that the MR indexing puts very little strain on the running system to
perform the updates/reindexes I believe this differs from how ES and Solr
are implemented.

Let me know if this doesn't answer your questions or if you want to go into
any more detail.  Thanks!

Aaron



On Tue, Jan 3, 2017 at 3:42 PM, Lukáš Vlček <lukas.vlcek@gmail.com> wrote:

> Hi,
>
> What does it mean that Blur's approach "is arguably better" for large data
> compared to mentioned competitors? Does it mean faster indexing? Smaller
> index size? Better utilization of resources (RAM, CPU, IO) for large data
> querying? ... I would be interested in learning more about how it differs
> from Elasticsearch and Solr.
>
> Regards,
> Lukáš
>
>
> On Sun, Dec 25, 2016 at 6:30 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > It is, but without a community of active developers it has become
> > stagnant.  For example the Lucene library version it utilizes has become
> > outdated and it would likely be a major undertaking to update the code
> base
> > to the newest version.  The biggest reason for the low activity it that I
> > haven't had time to work on the project due to personnel reasons.
> >
> > In it's current state is it very stable even at very large index sizes
> > however the upfront development effort to use Blur is very high by
> > comparison to ElasticSearch or Solr.  I believe this was the primary
> reason
> > Blur never really caught on in the community.
> >
> > Aaron
> >
> > On Sun, Dec 25, 2016 at 12:14 PM, Mark Kerzner <mark@elephantscale.com>
> > wrote:
> >
> > > But,
> > >
> > > Isn't Blur a new approach arguably better than SOLR and ElasticSearch
> for
> > > big sizes?
> > >
> > > Mark
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message