accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "THORMAN, ROBERT D" <rt2...@att.com>
Subject Re: Search
Date Thu, 24 Jul 2014 16:17:35 GMT
Search the terms (words, phases, sub-strings, combinations) of the row
values.  Lucene is an apache project that does document indexing on terms.

v/r
Bob Thorman
Principal Big Data Engineer
AT&T Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714






On 7/24/14, 9:52 AM, "Kepner, Jeremy - 0553 - MITLL" <kepner@ll.mit.edu>
wrote:

>What is meant by lexical search? Lucene style?
>
>http://www.lucenetutorial.com/lucene-query-syntax.html
>
>If so, these searches could be prioritized (not all are particularly
>useful), and it shouldn't be too hard to come up with recommended
>Accumulo approaches for the most important lexical searches.
>
>On Jul 24, 2014, at 10:44 AM, Donald Miner <dminer@clearedgeit.com> wrote:
>
>> One problem I ran into when thinking about this problem is throughput.
>>In
>> accumulo, we talk about tens or hundreds of thousands or millions of
>> records per second. A lot of these search solutions talk about hundreds
>>or
>> thousands of documents per second.
>> 
>> This problem that Accumulo is able to outpace just about anything lead
>>me
>> to think that some sort of microbatch solution might be the best
>>choice. If
>> you wait for your data to be indexed before moving on to the next
>>Accumulo
>> insert you can start lagging behind. Basically, you are crippling your
>> ingest throughput by making it the slower of the two systems.
>> 
>> It seems like a more microbatch (or batch) approach might be
>>worthwhile--
>> what you are trading is your text index lagging behind, but you keep
>>your
>> ingest throughput in Accumulo. I think Apache Blur does batch parallel
>> indexing, which is why I was looking at it for this.
>> 
>> 
>> On Thu, Jul 24, 2014 at 10:27 AM, Roshan Punnoose <roshanp@gmail.com>
>>wrote:
>> 
>>> Yeah I think David's solution is the best. Though I like the idea of
>>>having
>>> a server side Constraint or hook that puts the updates into the queue.
>>> 
>>> The Cassandra work I had seen actually tightly couples a Cassandra
>>>node to
>>> a Solr shard. So all the data that exists on that specific node also
>>>exists
>>> on that specific Solr shard. Would be pretty cool to do the same thing
>>>with
>>> a tablet server => local Solr shard.
>>> 
>>> 
>>> On Wed, Jul 23, 2014 at 6:09 PM, David Medinets
>>><david.medinets@gmail.com>
>>> wrote:
>>> 
>>>> Ingest to a queue. Have two processes subscribe to the queue. One
>>>> pushing into Accumulo and the other pushing into SolrCloud. Why
>>>> tightly couple the capabilities?
>>>> 
>>>> On Wed, Jul 23, 2014 at 4:39 PM, Roshan Punnoose <roshanp@gmail.com>
>>>> wrote:
>>>>> Is there a way to tie into the write process in Accumulo? Maybe just
>>> use
>>>> an
>>>>> Iterator that worked on compaction to send data to blur/solr? I have
>>> seen
>>>>> something similar in Cassandra, a data hook to save data in Solr.
>>>>> 
>>>>> 
>>>>> On Fri, Jul 18, 2014 at 6:46 PM, Nehal Mehta <nehal413@gmail.com>
>>> wrote:
>>>>> 
>>>>>> We were trying to do so, but adding visibility while
>>>>>>adding/searching
>>>>>> documents needs lot more thinking. Adding visibility to core search
>>>> engine
>>>>>> needs changes to algorithm and that does not make it very scalable.
>>>>>> Integration besides granular visibility is very doable. and we had
>>> taken
>>>>>> inspiration from Solandra.
>>>>>> 
>>>>>> Obviously if we can get it done it adds lot of value. I believe
>>>>>>Sqrrl
>>>>>> people have already done it, are they thinking to open source it
>>>> anytime in
>>>>>> future?
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jul 17, 2014 at 3:09 PM, Donald Miner
>>>>>><dminer@clearedgeit.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> We briefly toyed with blur on accumulo but didnt get too far
just
>>>> because
>>>>>>> it was obe. I think that would be cool.
>>>>>>> 
>>>>>>>> On Jul 17, 2014, at 3:06 PM, Josh Elser <josh.elser@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> It's definitely possible. I remember hearing about someone
doing
>>>> lucene
>>>>>>> on top of Accumulo once, but I don't recall seeing a nice package
>>>> with a
>>>>>>> bow on top.
>>>>>>>> 
>>>>>>>>> On 7/17/14, 2:53 PM, THORMAN, ROBERT D wrote:
>>>>>>>>> What lexical search package (like lucene/solr) has anyone
put on
>>>> top
>>>>>> of
>>>>>>> accumulo?  Is this possible or does everyone just index log files
>>> and
>>>>>>> documents?
>>>>>>>>> 
>>>>>>>>> v/r
>>>>>>>>> Bob Thorman
>>>>>>>>> Principal Big Data Engineer
>>>>>>>>> AT&T Big Data CoE
>>>>>>>>> 2900 W. Plano Parkway
>>>>>>>>> Plano, TX 75075
>>>>>>>>> 972-658-1714
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> 
>> Donald Miner
>> Chief Technology Officer
>> ClearEdge IT Solutions, LLC
>> Cell: 443 799 7807
>> www.clearedgeit.com
>


Mime
View raw message