From user-return-3584-apmail-accumulo-user-archive=accumulo.apache.org@accumulo.apache.org Thu Jan 16 00:30:14 2014 Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DE6910B79 for ; Thu, 16 Jan 2014 00:30:14 +0000 (UTC) Received: (qmail 46229 invoked by uid 500); 16 Jan 2014 00:30:13 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 46159 invoked by uid 500); 16 Jan 2014 00:30:12 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 46151 invoked by uid 99); 16 Jan 2014 00:30:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 00:30:12 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=108679a4b2=matt.dickson@defence.gov.au designates 203.6.68.1 as permitted sender) Received: from [203.6.68.1] (HELO defence.gov.au) (203.6.68.1) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 00:30:06 +0000 From: "Dickson, Matt MR" To: "'user@accumulo.apache.org'" Date: Thu, 16 Jan 2014 11:29:37 +1100 Subject: RE: List of unique qualifiers [SEC=UNOFFICIAL] Thread-Topic: List of unique qualifiers [SEC=UNOFFICIAL] Thread-Index: Ac8Rjfu1eaw5HEwpTSmPNW96aSDZDQAw30rQ Message-ID: <24070BEF0A3F684489AA943FD3439EF205A2730D8B@CARRXM06.drn.mil.au> References: <24070BEF0A3F684489AA943FD3439EF205A2730D89@CARRXM06.drn.mil.au> <24070BEF0A3F684489AA943FD3439EF205A2730D8A@CARRXM06.drn.mil.au> In-Reply-To: Accept-Language: en-US, en-AU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-protective-marking: VER=2012.3, NS=gov.au, SEC=UNOFFICIAL, ORIGIN=matt.dickson@defence.gov.au x-tituslabs-classifications-30: TLPropertyRoot=Titus;SEC=UNOFFICIAL; x-tituslabs-classificationhash-30: VgNFIFU9Hx+/nZJb9Kg7IuO/w/mBDuOXf7KGKGn4nQLgmomAE+lAxU/sbYBI7bNER3UFDQn6TzaXXVwr284Vc+GqSH2nzxhZq6qzdZxyBdTAqYUUySMl87UVPGGO9uXjeFwtkuUygxy9M5nt/N936n37VQrTq4DxXc1zl6SNYpNmvs0xntEOf+FTWR6rGMmJhzlLENkUw+x/WyBt6f1Ogw== x-titus-version: 3.5.8.4 x-tituslabs-subjectpostlabel: [SEC=UNOFFICIAL] acceptlanguage: en-US, en-AU Content-Type: multipart/alternative; boundary="_000_24070BEF0A3F684489AA943FD3439EF205A2730D8BCARRXM06drnmi_" MIME-Version: 1.0 X-OriginalArrivalTime: 16 Jan 2014 00:29:38.0192 (UTC) FILETIME=[0A079100:01CF1252] X-Virus-Checked: Checked by ClamAV on apache.org --_000_24070BEF0A3F684489AA943FD3439EF205A2730D8BCARRXM06drnmi_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable UNOFFICIAL Thanks Keith. I've run a simple mr job based on the UniqueColumns example,= but due to the size of the table this is taking a very long time. Is it p= ossible to pre-filter the data that goes to the MR job based on family, eg = only run the MR job on columns with a specific column family of 'cityofbirt= h'? I am currently going through every column in the table and checking th= e column family in the mapper ... slow. ________________________________ From: Keith Turner [mailto:keith@deenlo.com] Sent: Wednesday, 15 January 2014 12:06 To: user@accumulo.apache.org Subject: Re: List of unique qualifiers [SEC=3DUNOFFICIAL] On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR > wrote: UNOFFICIAL Just for simplicity, this is a one of request for managment so I was hoping= to just scan via the shell and output to a file. If I need to do it via a mr job I can do it that way and would be keen to h= ear any suggestions. You could modify the following example in 1.4 to suit your needs. src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapre= duce/UniqueColumns.java ________________________________ From: David Medinets [mailto:david.medinets@gmail.com] Sent: Wednesday, 15 January 2014 09:36 To: accumulo-user Subject: Re: List of unique qualifiers [SEC=3DUNOFFICIAL] Why the restriction to the shell environment? A nice map-reduce job would b= e ideal for this task. On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR > wrote: UNOFFICIAL Hi, I need to extract a list of unique qualifier values on a table from the Acc= umulo shell. For every column there is a column family that identifies a s= pecific qualifer, eg 'cityofbirth'. I would like to get a unique list of a= ll cities that are a listed in the qualifier against 'cityofbirth' for all = rows. eg, If I had a table with Rowid Family Qual 123 cityofbirth LosAngeles 133 cityofbirth Brisbane 222 cityofbirth London 124 cityofbirth London 124 cityofbirth London I want a list that is just; LosAngeles London Brisbane Any suggestions on how to achieve this from the shell would great. Thanks in advance. Matt --_000_24070BEF0A3F684489AA943FD3439EF205A2730D8BCARRXM06drnmi_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

UNOFFICIAL

Thanks Keith.  I've= run a simple=20 mr job based on the UniqueColumns example, but due to the size of the table= this=20 is taking a very long time.  Is it possible to pre-filter the data tha= t=20 goes to the MR job based on family, eg only run the MR job on columns = with=20 a specific column family of 'cityofbirth'?  I am currently going throu= gh=20 every column in the table and checking the column family in the mapper ...= =20 slow.
 
 


From: Keith Turner [mailto:keith@deenlo= .com]=20
Sent: Wednesday, 15 January 2014 12:06
To:=20 user@accumulo.apache.org
Subject: Re: List of unique qualifiers=20 [SEC=3DUNOFFICIAL]




On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR = <matt.dickson@defence.gov.au> wrote:

UNOFFICIAL

Just= for=20 simplicity, this is a one of request for managment so I was hoping to jus= t=20 scan via the shell and output to a file. 
 
If I= need to do=20 it via a mr job I can do it that way and would be keen to hear any=20 suggestions.

You could modify the following example in 1.4 to suit your needs.=20  

src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/= mapreduce/UniqueColumns.java
 


From: David Medinets [mailto:david.medinets@gmail.com]
Sent: Wednesday,= 15=20 January 2014 09:36
To: accumulo-user
Subject: Re: Lis= t of=20 unique qualifiers [SEC=3DUNOFFICIAL]

Why the restriction to the shell environment? A nice map-r= educe=20 job would be ideal for this task.


On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt M= R <matt.dickson@defence.gov.au> wrote:

UNOFFICIAL

Hi,
 
I need to extract a list of unique qualif= ier=20 values on a table from the Accumulo shell.  For every column there= =20 is a column family that identifies a specific qualifer, eg=20 'cityofbirth'.  I would like to get a unique list of all cities th= at=20 are a listed in the qualifier against 'cityofbirth' for all rows. = =20
 
eg, If I had a table with
 
Rowid         = ;      =20 Family           = ;=20 Qual
123         &= nbsp;         cityofbirth = ;       =20 LosAngeles
133         &= nbsp;         cityofbirth = ;        Brisbane
222         &= nbsp;        =20 cityofbirth         London=
124         &= nbsp;         cityofbirth = ;       =20 London
124         &= nbsp;         cityofbirth = ;       =20 London
 
I want a list that is just;=
LosAngeles
London
Brisbane
 
Any suggestions on how to achieve this fr= om the=20 shell would great.
 
Thanks in=20 advance.
Matt
 
 
 
<= /DIV>

= --_000_24070BEF0A3F684489AA943FD3439EF205A2730D8BCARRXM06drnmi_--