accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donald Miner <dmi...@clearedgeit.com>
Subject Re: Search
Date Thu, 24 Jul 2014 14:44:28 GMT
One problem I ran into when thinking about this problem is throughput. In
accumulo, we talk about tens or hundreds of thousands or millions of
records per second. A lot of these search solutions talk about hundreds or
thousands of documents per second.

This problem that Accumulo is able to outpace just about anything lead me
to think that some sort of microbatch solution might be the best choice. If
you wait for your data to be indexed before moving on to the next Accumulo
insert you can start lagging behind. Basically, you are crippling your
ingest throughput by making it the slower of the two systems.

It seems like a more microbatch (or batch) approach might be worthwhile--
what you are trading is your text index lagging behind, but you keep your
ingest throughput in Accumulo. I think Apache Blur does batch parallel
indexing, which is why I was looking at it for this.


On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <roshanp@gmail.com> wrote:

> Yeah I think David's solution is the best. Though I like the idea of having
> a server side Constraint or hook that puts the updates into the queue.
>
> The Cassandra work I had seen actually tightly couples a Cassandra node to
> a Solr shard. So all the data that exists on that specific node also exists
> on that specific Solr shard. Would be pretty cool to do the same thing with
> a tablet server => local Solr shard.
>
>
> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets <david.medinets@gmail.com>
> wrote:
>
> > Ingest to a queue. Have two processes subscribe to the queue. One
> > pushing into Accumulo and the other pushing into SolrCloud. Why
> > tightly couple the capabilities?
> >
> > On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <roshanp@gmail.com>
> > wrote:
> > > Is there a way to tie into the write process in Accumulo? Maybe just
> use
> > an
> > > Iterator that worked on compaction to send data to blur/solr? I have
> seen
> > > something similar in Cassandra, a data hook to save data in Solr.
> > >
> > >
> > > On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <nehal413@gmail.com>
> wrote:
> > >
> > >> We were trying to do so, but adding visibility while adding/searching
> > >> documents needs lot more thinking. Adding visibility to core search
> > engine
> > >> needs changes to algorithm and that does not make it very scalable.
> > >> Integration besides granular visibility is very doable. and we had
> taken
> > >> inspiration from Solandra.
> > >>
> > >> Obviously if we can get it done it adds lot of value. I believe Sqrrl
> > >> people have already done it, are they thinking to open source it
> > anytime in
> > >> future?
> > >>
> > >>
> > >> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner <dminer@clearedgeit.com
> >
> > >> wrote:
> > >>
> > >> > We briefly toyed with blur on accumulo but didnt get too far just
> > because
> > >> > it was obe. I think that would be cool.
> > >> >
> > >> > > On Jul 17, 2014, at 3:06 PM, Josh Elser <josh.elser@gmail.com>
> > wrote:
> > >> > >
> > >> > > It's definitely possible. I remember hearing about someone doing
> > lucene
> > >> > on top of Accumulo once, but I don't recall seeing a nice package
> > with a
> > >> > bow on top.
> > >> > >
> > >> > >> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> > >> > >> What lexical search package (like lucene/solr) has anyone
put on
> > top
> > >> of
> > >> > accumulo?  Is this possible or does everyone just index log files
> and
> > >> > documents?
> > >> > >>
> > >> > >> v/r
> > >> > >> Bob Thorman
> > >> > >> Principal Big Data Engineer
> > >> > >> AT&T Big Data CoE
> > >> > >> 2900 W. Plano Parkway
> > >> > >> Plano, TX 75075
> > >> > >> 972-658-1714
> > >> > >>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> >
>



-- 

Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message