accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nehal Mehta <nehal...@gmail.com>
Subject Re: Search
Date Thu, 24 Jul 2014 16:37:47 GMT
If we have two streams, we would just store data into Accumulo and use it
as backend. What we are/were trying to implement was secure search. So if
user does not have rights to search that cell, user can see other listing
but not one which is inaccessible. By doing so we would add lot more value.

Am I missing something?


On Thu, Jul 24, 2014 at 12:17 PM, THORMAN, ROBERT D <rt2357@att.com> wrote:

> Search the terms (words, phases, sub-strings, combinations) of the row
> values.  Lucene is an apache project that does document indexing on terms.
>
> v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>
>
>
>
>
> On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <kepner@ll.mit.edu>
> wrote:
>
> >What is meant by lexical search? Lucene style?
> >
> >http://www.lucenetutorial.com/lucene-query-syntax.html
> >
> >If so, these searches could be prioritized (not all are particularly
> >useful), and it shouldn't be too hard to come up with recommended
> >Accumulo approaches for the most important lexical searches.
> >
> >On Jul 24, 2014, at 10:44 AM, Donald Miner <dminer@clearedgeit.com>
> wrote:
> >
> >> One problem I ran into when thinking about this problem is throughput.
> >>In
> >> accumulo, we talk about tens or hundreds of thousands or millions of
> >> records per second. A lot of these search solutions talk about hundreds
> >>or
> >> thousands of documents per second.
> >>
> >> This problem that Accumulo is able to outpace just about anything lead
> >>me
> >> to think that some sort of microbatch solution might be the best
> >>choice. If
> >> you wait for your data to be indexed before moving on to the next
> >>Accumulo
> >> insert you can start lagging behind. Basically, you are crippling your
> >> ingest throughput by making it the slower of the two systems.
> >>
> >> It seems like a more microbatch (or batch) approach might be
> >>worthwhile--
> >> what you are trading is your text index lagging behind, but you keep
> >>your
> >> ingest throughput in Accumulo. I think Apache Blur does batch parallel
> >> indexing, which is why I was looking at it for this.
> >>
> >>
> >> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <roshanp@gmail.com>
> >>wrote:
> >>
> >>> Yeah I think David's solution is the best. Though I like the idea of
> >>>having
> >>> a server side Constraint or hook that puts the updates into the queue.
> >>>
> >>> The Cassandra work I had seen actually tightly couples a Cassandra
> >>>node to
> >>> a Solr shard. So all the data that exists on that specific node also
> >>>exists
> >>> on that specific Solr shard. Would be pretty cool to do the same thing
> >>>with
> >>> a tablet server => local Solr shard.
> >>>
> >>>
> >>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets
> >>><david.medinets@gmail.com>
> >>> wrote:
> >>>
> >>>> Ingest to a queue. Have two processes subscribe to the queue. One
> >>>> pushing into Accumulo and the other pushing into SolrCloud. Why
> >>>> tightly couple the capabilities?
> >>>>
> >>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <roshanp@gmail.com>
> >>>> wrote:
> >>>>> Is there a way to tie into the write process in Accumulo? Maybe
just
> >>> use
> >>>> an
> >>>>> Iterator that worked on compaction to send data to blur/solr? I
have
> >>> seen
> >>>>> something similar in Cassandra, a data hook to save data in Solr.
> >>>>>
> >>>>>
> >>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <nehal413@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> We were trying to do so, but adding visibility while
> >>>>>>adding/searching
> >>>>>> documents needs lot more thinking. Adding visibility to core
search
> >>>> engine
> >>>>>> needs changes to algorithm and that does not make it very scalable.
> >>>>>> Integration besides granular visibility is very doable. and
we had
> >>> taken
> >>>>>> inspiration from Solandra.
> >>>>>>
> >>>>>> Obviously if we can get it done it adds lot of value. I believe
> >>>>>>Sqrrl
> >>>>>> people have already done it, are they thinking to open source
it
> >>>> anytime in
> >>>>>> future?
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner
> >>>>>><dminer@clearedgeit.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> We briefly toyed with blur on accumulo but didnt get too
far just
> >>>> because
> >>>>>>> it was obe. I think that would be cool.
> >>>>>>>
> >>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <josh.elser@gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> It's definitely possible. I remember hearing about someone
doing
> >>>> lucene
> >>>>>>> on top of Accumulo once, but I don't recall seeing a nice
package
> >>>> with a
> >>>>>>> bow on top.
> >>>>>>>>
> >>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
> >>>>>>>>> What lexical search package (like lucene/solr) has
anyone put on
> >>>> top
> >>>>>> of
> >>>>>>> accumulo?  Is this possible or does everyone just index
log files
> >>> and
> >>>>>>> documents?
> >>>>>>>>>
> >>>>>>>>> v/r
> >>>>>>>>> Bob Thorman
> >>>>>>>>> Principal Big Data Engineer
> >>>>>>>>> AT&T Big Data CoE
> >>>>>>>>> 2900 W. Plano Parkway
> >>>>>>>>> Plano, TX 75075
> >>>>>>>>> 972-658-1714
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Donald Miner
> >> Chief Technology Officer
> >> ClearEdge IT Solutions, LLC
> >> Cell: 443 799 7807
> >> www.clearedgeit.com
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message