Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8A82410CB1 for ; Thu, 16 Jan 2014 01:27:32 +0000 (UTC) Received: (qmail 11463 invoked by uid 500); 16 Jan 2014 01:27:31 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 11415 invoked by uid 500); 16 Jan 2014 01:27:31 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 11407 invoked by uid 99); 16 Jan 2014 01:27:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 01:27:31 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cjnolet@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jan 2014 01:27:26 +0000 Received: by mail-ie0-f170.google.com with SMTP id u16so1212044iet.1 for ; Wed, 15 Jan 2014 17:27:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=u9+85pnvQRK02leCy9ppaCPfvVLOhTNIZnydxgQZq1Y=; b=D8Ts7TPw0nS6TqmPkEppv6CnHrtS9uwH9a9FAJfw2egKgNxUkuqE/3hzYKEVZXYUpL y6kSukRaYkkPxzJyQzXsRrr9qH9VM+On9XwZ22UK3BA//nl83/2Ci+eB+XivOM8fl7MF FQTA41biSzOyG0vY6D2415r41X0YwuAM+GXz1phjTpAfCsarheL37T1DhvJjgnbkfI1q peAEWkW9X8I8KxJsTz7HWnYxCIrzsjYTT4TXKudwdZN7ntxVPyep6CRkhWMZmpLYhGL6 SpKUy0Z4ZSs3XXjtir1rGsZqVP3Ln45O9alIao9kSOjlEFXNkkGczEfnUDe/HknKd/Yr YZWA== MIME-Version: 1.0 X-Received: by 10.42.246.131 with SMTP id ly3mr5420647icb.8.1389835625632; Wed, 15 Jan 2014 17:27:05 -0800 (PST) Received: by 10.64.234.228 with HTTP; Wed, 15 Jan 2014 17:27:05 -0800 (PST) In-Reply-To: <24070BEF0A3F684489AA943FD3439EF205A2730D8B@CARRXM06.drn.mil.au> References: <24070BEF0A3F684489AA943FD3439EF205A2730D89@CARRXM06.drn.mil.au> <24070BEF0A3F684489AA943FD3439EF205A2730D8A@CARRXM06.drn.mil.au> <24070BEF0A3F684489AA943FD3439EF205A2730D8B@CARRXM06.drn.mil.au> Date: Wed, 15 Jan 2014 20:27:05 -0500 Message-ID: Subject: Re: List of unique qualifiers [SEC=UNOFFICIAL] From: Corey Nolet To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=90e6ba6e8710c31b2404f00c5274 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6e8710c31b2404f00c5274 Content-Type: text/plain; charset=ISO-8859-1 Matt, This should help: Collection> cols = Collections.singleton(new Pair(new Text("cityOfBirth"), null)); AccumuloInputFormat.fetchColumns(job, cols); On Wed, Jan 15, 2014 at 7:29 PM, Dickson, Matt MR < matt.dickson@defence.gov.au> wrote: > *UNOFFICIAL* > Thanks Keith. I've run a simple mr job based on the UniqueColumns > example, but due to the size of the table this is taking a very long time. > Is it possible to pre-filter the data that goes to the MR job based on > family, eg only run the MR job on columns with a specific column family of > 'cityofbirth'? I am currently going through every column in the table and > checking the column family in the mapper ... slow. > > > > ------------------------------ > *From:* Keith Turner [mailto:keith@deenlo.com] > *Sent:* Wednesday, 15 January 2014 12:06 > *To:* user@accumulo.apache.org > > *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL] > > > > > On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR < > matt.dickson@defence.gov.au> wrote: > >> *UNOFFICIAL* >> Just for simplicity, this is a one of request for managment so I was >> hoping to just scan via the shell and output to a file. >> >> If I need to do it via a mr job I can do it that way and would be keen to >> hear any suggestions. >> > > You could modify the following example in 1.4 to suit your needs. > > > src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/UniqueColumns.java > > >> >> ------------------------------ >> *From:* David Medinets [mailto:david.medinets@gmail.com] >> *Sent:* Wednesday, 15 January 2014 09:36 >> *To:* accumulo-user >> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL] >> >> Why the restriction to the shell environment? A nice map-reduce job >> would be ideal for this task. >> >> >> On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR < >> matt.dickson@defence.gov.au> wrote: >> >>> *UNOFFICIAL* >>> Hi, >>> >>> I need to extract a list of unique qualifier values on a table from the >>> Accumulo shell. For every column there is a column family that identifies >>> a specific qualifer, eg 'cityofbirth'. I would like to get a unique list >>> of all cities that are a listed in the qualifier against 'cityofbirth' for >>> all rows. >>> >>> eg, If I had a table with >>> >>> Rowid Family Qual >>> 123 cityofbirth LosAngeles >>> 133 cityofbirth Brisbane >>> 222 cityofbirth London >>> 124 cityofbirth London >>> 124 cityofbirth London >>> >>> I want a list that is just; >>> LosAngeles >>> London >>> Brisbane >>> >>> Any suggestions on how to achieve this from the shell would great. >>> >>> Thanks in advance. >>> Matt >>> >>> >>> >>> >> >> > --90e6ba6e8710c31b2404f00c5274 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Matt,

This should help:

<= /div>
Collection<Pair<Text,Text>> cols =3D Collections.sing= leton(new Pair<Text,Text>(new Text("cityOfBirth"), null));<= /div>
AccumuloInputFormat.fetchColumns(job, cols);

=


O= n Wed, Jan 15, 2014 at 7:29 PM, Dickson, Matt MR <matt.dickson@d= efence.gov.au> wrote:

UNOFFICIAL

Thanks Keith.=A0 I've run a simple=20 mr job based on the UniqueColumns example, but due to the size of the table= this=20 is taking a very long time.=A0 Is it possible to pre-filter the data that= =20 goes to the MR job based on family,=A0eg only run the MR job on columns wit= h=20 a specific column family of 'cityofbirth'?=A0 I am currently going = through=20 every column in the table and checking the column family in the mapper ...= =20 slow.
=A0
=A0


From: Keith Turner [mailto:keith@deenlo.com]=20
Sent: Wednesday, 15 January 2014 12:06
To:=20 user@accumulo= .apache.org

Subject: Re: List of uniqu= e qualifiers=20 [SEC=3DUNOFFICIAL]




On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt M= R <matt.dickson@defence.gov.au> wrote:

UNOFFICIAL<= /font>

Just for=20 simplicity, this is a one of request for managment so I was hoping to jus= t=20 scan via the shell and output to a file.=A0
=A0
If I need to do=20 it via a mr job I can do it that way and would be keen to hear any=20 suggestions.

You could modify the following example in 1.4 to suit your needs.=20 =A0

src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/= mapreduce/UniqueColumns.java
=A0


From: David Medinets [mailto:david.medinets@gmail.com= ]
Sent: Wednesday, 15=20 January 2014 09:36
To: accumulo-user
Subject: Re: Lis= t of=20 unique qualifiers [SEC=3DUNOFFICIAL]

Why the restriction to the shell environment? A nice map= -reduce=20 job would be ideal for this task.


On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt= MR <matt.dickson@defence.gov.au> wrote:

UNOFFICIAL

Hi,
=A0
I need to extract a list of unique qual= ifier=20 values on a table from the Accumulo shell.=A0 For every column there=20 is=A0a column family that identifies a specific qualifer, eg=20 'cityofbirth'.=A0 I would like to get a unique list of all citi= es that=20 are a listed in the qualifier against 'cityofbirth' for all row= s.=A0=20
=A0
eg, If I had a table with=
=A0
Rowid=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=20 Family=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=20 Qual
123=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0cityofbirth=A0=A0=A0=A0=A0=A0=A0=A0=20 LosAngeles
133=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0cityofbirth=A0=A0=A0=A0=A0=A0=A0=A0=A0Brisbane<= /span>
222=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=20 cityofbirth=A0=A0=A0=A0=A0=A0=A0=A0=A0London
124=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0cityofbirth=A0=A0=A0=A0=A0=A0=A0=A0=20 London
124=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0cityofbirth=A0=A0=A0=A0=A0=A0=A0=A0=20 London
=A0
I want a list that is just;
LosAngeles
London
Brisbane
=A0
Any suggestions on how to achieve this = from the=20 shell would great.
=A0
Thanks in=20 advance.
Matt
=A0
=A0
=A0


<= /div>

--90e6ba6e8710c31b2404f00c5274--