accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Search function
Date Thu, 07 May 2015 02:44:15 GMT
A good way to think about this is that Accumulo provides a single Index: 
rowID.

You can find rows (or row with colfam, or row with colfam and colqual, 
etc) very quickly, but anything else is an exhaustive search.

Any time you want to search quickly against some other dimension of the 
data, it typically requires some pivot of your data so that other 
dimension is ordered by the rowID.

If you want to search records by value, you have to put the value in the 
Row and the ID in the value (or at least somewhere else). Thankfully, 
you can leverage Accumulo to very effectively store multiple indexes 
(inverted indexes if you will) in a single table as Accumulo allows 
dynamic column families.

Christopher wrote:
> Since Accumulo is essentially a big sorted map, it is most efficient
> searching by the row. When you search by other fields, you are
> searching the entire data set, and filtering. That is usually not very
> efficient. The API provides a way to do this relatively easily by
> specifying family or family:qualifier, but it does not (as you've
> observed) make it easy to do this by Value.
>
> There are a few options:
>
> 1. You can configure the RegExFilter as a scan-time iterator. (This is
> going to be terribly inefficient.)
> 2. You can adopt adopt a secondary indexing strategy.
>
> I would do option #2. As you've described, your data is indexed by ID.
> If you need an index on whatever you're storing in the Value, you
> should make a new table (or new family/locality group) which stores
> your data sorted by that instead of ID. You can either just store the
> ID in this secondary index, and do two lookups (the secondary index to
> find the ID, then the main data once you have the ID), or you can
> store all the data a second time, ordered by the contents of your
> Value (this trade space for performance).
>
> There are more complex strategies, but these are the basics.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, May 6, 2015 at 10:10 AM, Revan1988<andrealeoni88@gmail.com>  wrote:
>> Hi,
>> I've got an other question about using Accumulo.
>>
>> My table is something like that:
>>
>> ID1 info:name JhonSmith
>> ID1 info:birth 1988-06-26
>> ID1 study:university ComputerEngineering
>> ID1 study:graduated Yes
>>
>> ID2 info:name GeorgeDuff
>> ID2 info:birth 1984-01-29
>> ID2 study:university Math
>> ID2 study:graduated Yes
>>
>> ...
>>
>>
>> I want all info about JhonSmith but with Java API I've found only method to
>> search by row, family or family:qualifier ...
>>
>> I need to search by Value and after to use its row (IDx) to search all other
>> entries that has the same row (IDx).
>>
>> for example i need all info about JhonSmith (birth, university, graduated
>> ...).
>>
>> I hope I explain my problem.
>> Sorry again for my bad english.
>>
>> ...and once again:
>> Thank you!!!
>>
>>
>>
>> -----
>> Andrea Leoni
>> Italy
>> Computer Engineer
>> --
>> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Search-function-tp14030.html
>> Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message