accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: List of unique qualifiers [SEC=UNOFFICIAL]
Date Tue, 14 Jan 2014 23:11:58 GMT
Depending on the amount of data, you could do a scan -c for the colfams 
you want, awk out the colqual and dump that to a file. Afterwards, you 
could sort and uniq.

The MR example would be pretty simple too -- same idea as above. Very 
similar to your run-of-the-mill wordcount. AccumuloInputFormat will let 
you just fetch the colfams you're interested in.

map:
foreach Key in colfams:
    emit colqual

reduce:
emit one instance of each colqual.

On 1/14/14, 6:06 PM, Dickson, Matt MR wrote:
> *UNOFFICIAL*
>
> Just for simplicity, this is a one of request for managment so I was
> hoping to just scan via the shell and output to a file.
> If I need to do it via a mr job I can do it that way and would be keen
> to hear any suggestions.
>
> ------------------------------------------------------------------------
> *From:* David Medinets [mailto:david.medinets@gmail.com]
> *Sent:* Wednesday, 15 January 2014 09:36
> *To:* accumulo-user
> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]
>
> Why the restriction to the shell environment? A nice map-reduce job
> would be ideal for this task.
>
>
> On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR
> <matt.dickson@defence.gov.au <mailto:matt.dickson@defence.gov.au>> wrote:
>
>     __
>
>     *UNOFFICIAL*
>
>     Hi,
>     I need to extract a list of unique qualifier values on a table from
>     the Accumulo shell.  For every column there is a column family that
>     identifies a specific qualifer, eg 'cityofbirth'.  I would like to
>     get a unique list of all cities that are a listed in the qualifier
>     against 'cityofbirth' for all rows.
>     eg, If I had a table with
>     Rowid Family Qual
>     123                   cityofbirth LosAngeles
>     133                   cityofbirth         Brisbane
>     222 cityofbirth         London
>     124                   cityofbirth London
>     124                   cityofbirth London
>     I want a list that is just;
>     LosAngeles
>     London
>     Brisbane
>     Any suggestions on how to achieve this from the shell would great.
>     Thanks in advance.
>     Matt
>
>

Mime
View raw message