accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Search function
Date Wed, 06 May 2015 21:08:13 GMT
Since Accumulo is essentially a big sorted map, it is most efficient
searching by the row. When you search by other fields, you are
searching the entire data set, and filtering. That is usually not very
efficient. The API provides a way to do this relatively easily by
specifying family or family:qualifier, but it does not (as you've
observed) make it easy to do this by Value.

There are a few options:

1. You can configure the RegExFilter as a scan-time iterator. (This is
going to be terribly inefficient.)
2. You can adopt adopt a secondary indexing strategy.

I would do option #2. As you've described, your data is indexed by ID.
If you need an index on whatever you're storing in the Value, you
should make a new table (or new family/locality group) which stores
your data sorted by that instead of ID. You can either just store the
ID in this secondary index, and do two lookups (the secondary index to
find the ID, then the main data once you have the ID), or you can
store all the data a second time, ordered by the contents of your
Value (this trade space for performance).

There are more complex strategies, but these are the basics.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, May 6, 2015 at 10:10 AM, Revan1988 <andrealeoni88@gmail.com> wrote:
> Hi,
> I've got an other question about using Accumulo.
>
> My table is something like that:
>
> ID1 info:name JhonSmith
> ID1 info:birth 1988-06-26
> ID1 study:university ComputerEngineering
> ID1 study:graduated Yes
>
> ID2 info:name GeorgeDuff
> ID2 info:birth 1984-01-29
> ID2 study:university Math
> ID2 study:graduated Yes
>
> ...
>
>
> I want all info about JhonSmith but with Java API I've found only method to
> search by row, family or family:qualifier ...
>
> I need to search by Value and after to use its row (IDx) to search all other
> entries that has the same row (IDx).
>
> for example i need all info about JhonSmith (birth, university, graduated
> ...).
>
> I hope I explain my problem.
> Sorry again for my bad english.
>
> ...and once again:
> Thank you!!!
>
>
>
> -----
> Andrea Leoni
> Italy
> Computer Engineer
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Search-function-tp14030.html
> Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message