accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "THORMAN, ROBERT D" <rt2...@att.com>
Subject Re: Search
Date Thu, 24 Jul 2014 21:06:00 GMT
Yes, you have missed my original request.  I need a fast way (i.e.
Pre-indexed) to perform lexical searches on row values without using a
regex based iterator.  I also do not want to duplicate data from the
cluster onto a document based strategy that is typically required by
packages like Apache Lucene.

v/r
Bob Thorman
Principal Big Data Engineer
AT&T Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714






On 7/24/14, 11:37 AM, "Nehal Mehta" <nehal413@gmail.com> wrote:

>If we have two streams, we would just store data into Accumulo and use it
>as backend. What we are/were trying to implement was secure search. So if
>user does not have rights to search that cell, user can see other listing
>but not one which is inaccessible. By doing so we would add lot more
>value.
>
>Am I missing something?
>
>
>On Thu, Jul 24, 2014 at 12:17 PM, THORMAN, ROBERT D <rt2357@att.com>
>wrote:
>
>> Search the terms (words, phases, sub-strings, combinations) of the row
>> values.  Lucene is an apache project that does document indexing on
>>terms.
>>
>> v/r
>> Bob Thorman
>> Principal Big Data Engineer
>> AT&T Big Data CoE
>> 2900 W. Plano Parkway
>> Plano, TX 75075
>> 972-658-1714
>>
>>
>>
>>
>>
>>
>> On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <kepner@ll.mit.edu>
>> wrote:
>>
>> >What is meant by lexical search? Lucene style?
>> >
>> >http://www.lucenetutorial.com/lucene-query-syntax.html
>> >
>> >If so, these searches could be prioritized (not all are particularly
>> >useful), and it shouldn't be too hard to come up with recommended
>> >Accumulo approaches for the most important lexical searches.
>> >
>> >On Jul 24, 2014, at 10:44 AM, Donald Miner <dminer@clearedgeit.com>
>> wrote:
>> >
>> >> One problem I ran into when thinking about this problem is
>>throughput.
>> >>In
>> >> accumulo, we talk about tens or hundreds of thousands or millions of
>> >> records per second. A lot of these search solutions talk about
>>hundreds
>> >>or
>> >> thousands of documents per second.
>> >>
>> >> This problem that Accumulo is able to outpace just about anything
>>lead
>> >>me
>> >> to think that some sort of microbatch solution might be the best
>> >>choice. If
>> >> you wait for your data to be indexed before moving on to the next
>> >>Accumulo
>> >> insert you can start lagging behind. Basically, you are crippling
>>your
>> >> ingest throughput by making it the slower of the two systems.
>> >>
>> >> It seems like a more microbatch (or batch) approach might be
>> >>worthwhile--
>> >> what you are trading is your text index lagging behind, but you keep
>> >>your
>> >> ingest throughput in Accumulo. I think Apache Blur does batch
>>parallel
>> >> indexing, which is why I was looking at it for this.
>> >>
>> >>
>> >> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <roshanp@gmail.com>
>> >>wrote:
>> >>
>> >>> Yeah I think David's solution is the best. Though I like the idea of
>> >>>having
>> >>> a server side Constraint or hook that puts the updates into the
>>queue.
>> >>>
>> >>> The Cassandra work I had seen actually tightly couples a Cassandra
>> >>>node to
>> >>> a Solr shard. So all the data that exists on that specific node also
>> >>>exists
>> >>> on that specific Solr shard. Would be pretty cool to do the same
>>thing
>> >>>with
>> >>> a tablet server => local Solr shard.
>> >>>
>> >>>
>> >>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets
>> >>><david.medinets@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Ingest to a queue. Have two processes subscribe to the queue. One
>> >>>> pushing into Accumulo and the other pushing into SolrCloud. Why
>> >>>> tightly couple the capabilities?
>> >>>>
>> >>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose
>><roshanp@gmail.com>
>> >>>> wrote:
>> >>>>> Is there a way to tie into the write process in Accumulo? Maybe
>>just
>> >>> use
>> >>>> an
>> >>>>> Iterator that worked on compaction to send data to blur/solr?
I
>>have
>> >>> seen
>> >>>>> something similar in Cassandra, a data hook to save data in
Solr.
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <nehal413@gmail.com>
>> >>> wrote:
>> >>>>>
>> >>>>>> We were trying to do so, but adding visibility while
>> >>>>>>adding/searching
>> >>>>>> documents needs lot more thinking. Adding visibility to
core
>>search
>> >>>> engine
>> >>>>>> needs changes to algorithm and that does not make it very
>>scalable.
>> >>>>>> Integration besides granular visibility is very doable.
and we
>>had
>> >>> taken
>> >>>>>> inspiration from Solandra.
>> >>>>>>
>> >>>>>> Obviously if we can get it done it adds lot of value. I
believe
>> >>>>>>Sqrrl
>> >>>>>> people have already done it, are they thinking to open source
it
>> >>>> anytime in
>> >>>>>> future?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner
>> >>>>>><dminer@clearedgeit.com
>> >>>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> We briefly toyed with blur on accumulo but didnt get
too far
>>just
>> >>>> because
>> >>>>>>> it was obe. I think that would be cool.
>> >>>>>>>
>> >>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <josh.elser@gmail.com>
>> >>>> wrote:
>> >>>>>>>>
>> >>>>>>>> It's definitely possible. I remember hearing about
someone
>>doing
>> >>>> lucene
>> >>>>>>> on top of Accumulo once, but I don't recall seeing a
nice
>>package
>> >>>> with a
>> >>>>>>> bow on top.
>> >>>>>>>>
>> >>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>> >>>>>>>>> What lexical search package (like lucene/solr)
has anyone put
>>on
>> >>>> top
>> >>>>>> of
>> >>>>>>> accumulo?  Is this possible or does everyone just index
log
>>files
>> >>> and
>> >>>>>>> documents?
>> >>>>>>>>>
>> >>>>>>>>> v/r
>> >>>>>>>>> Bob Thorman
>> >>>>>>>>> Principal Big Data Engineer
>> >>>>>>>>> AT&T Big Data CoE
>> >>>>>>>>> 2900 W. Plano Parkway
>> >>>>>>>>> Plano, TX 75075
>> >>>>>>>>> 972-658-1714
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Donald Miner
>> >> Chief Technology Officer
>> >> ClearEdge IT Solutions, LLC
>> >> Cell: 443 799 7807
>> >> www.clearedgeit.com
>> >
>>
>>


Mime
View raw message