Return-Path: X-Original-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1B2B992F for ; Sun, 25 Mar 2012 22:25:23 +0000 (UTC) Received: (qmail 11218 invoked by uid 500); 25 Mar 2012 22:25:23 -0000 Delivered-To: apmail-incubator-accumulo-user-archive@incubator.apache.org Received: (qmail 11186 invoked by uid 500); 25 Mar 2012 22:25:23 -0000 Mailing-List: contact accumulo-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-user@incubator.apache.org Delivered-To: mailing list accumulo-user@incubator.apache.org Received: (qmail 11178 invoked by uid 99); 25 Mar 2012 22:25:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Mar 2012 22:25:23 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rvesse@yarcdata.com designates 64.18.1.181 as permitted sender) Received: from [64.18.1.181] (HELO exprod6og101.obsmtp.com) (64.18.1.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Mar 2012 22:25:16 +0000 Received: from CFWEX01.americas.cray.com ([136.162.34.11]) (using TLSv1) by exprod6ob101.postini.com ([64.18.5.12]) with SMTP ID DSNKT2+bNnzB7n9BQzAkEhCDd4NLuYHHXINx@postini.com; Sun, 25 Mar 2012 15:24:55 PDT Received: from CFWEX01.americas.cray.com ([::1]) by CFWEX01.americas.cray.com ([::1]) with mapi id 14.01.0355.002; Sun, 25 Mar 2012 17:24:53 -0500 From: Robert Vesse To: "accumulo-user@incubator.apache.org" Subject: RE: Determine columns in a table? Thread-Topic: Determine columns in a table? Thread-Index: AQHNCr8UKK4MmRgFXEOjhVuj3u/L/pZ7lc9e Date: Sun, 25 Mar 2012 22:24:52 +0000 Message-ID: References: <1384001153.288835.1332624614899.JavaMail.root@linzimmb04o.imo.intelink.gov> <131659406.311788.1332702390585.JavaMail.root@linzimmb04o.imo.intelink.gov>, In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.234.40] Content-Type: multipart/alternative; boundary="_000_C0B6979A3CA668458B697E4EA907CA9410A713CFWEX01americascr_" MIME-Version: 1.0 --_000_C0B6979A3CA668458B697E4EA907CA9410A713CFWEX01americascr_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable That would work great if I had control over the data but I don't I'm writing an Accumulo plugin for an ETL tool (Pentaho) so users are going= to point it at an arbitrary table in an arbitrary Accumulo instance and th= en I just have to pull the data out knowing nothing about what is in there = or any available secondary indexes. I guess I'll probably go with something like scan the first 1000 rows to se= e what columns there are. Pentaho likes it if steps can expose what fields= they will produce in advance but the columns I declare don't have to be th= e full set, as long as I sample enough of the table to get a good sense of = what's in there steps further along the pipeline should be sufficiently inf= ormed that they can be appropriately configured by a user Rob ________________________________ From: John Vines [john.w.vines@ugov.gov] Sent: 25 March 2012 12:40 To: accumulo-user@incubator.apache.org Subject: RE: Determine columns in a table? Another option for you would be to create a table for indexing your column = information. That way a quick scan can give you everything. Sent from my phone, so pardon the typos and brevity. On Mar 25, 2012 3:06 PM, "Robert Vesse" > wrote: To clarify my question I'm not looking to see why a scan doesn't return dat= a. What I'm wanting to know is what columns a full table scan (taking rele= vant scan authorizations into account) will yield without doing the full ta= ble scan? But it sounds like the only way is to do the table scan because in my usage= scenario users won't have HDFS access Rob ________________________________ From: John Vines [john.w.vines@ugov.gov] Sent: 24 March 2012 16:27 To: accumulo-user@incubator.apache.org Subject: Re: Determine columns in a table? Individual keys have their own visibility and it is possible to have keys w= ith similar columns to have different visibilities. That said, we don't track the visibilities being used, so the only way is t= he mechanism Eric suggested. Sent from my phone, so pardon the typos and brevity. On Mar 24, 2012 5:30 PM, "Robert Vesse" > wrote: Obviously Accumulo is completely schema free but is there any easy way give= n a table name and optionally one/more scan authorizations to determine wha= t columns are visible to a user? Or is the only way to do this by scanning the table? Cheers Rob Rob Vesse -- YarcData.com -- A Division of Cray Inc Software Engineer, Bay Area m: 925.960.3941 | o: 925.264.4729 | @= : rvesse@yarcdata.com | Skype: rvesse 6210 Stoneridge Mall Rd | Suite 120 | Pleasanton CA, 94588 --_000_C0B6979A3CA668458B697E4EA907CA9410A713CFWEX01americascr_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
That would work great if I had control over the data but I don't

I'm writing an Accumulo plugin for an ETL tool (Pentaho) so users are going= to point it at an arbitrary table in an arbitrary Accumulo instance and th= en I just have to pull the data out knowing nothing about what is in there = or any available secondary indexes.

I guess I'll probably go with something like scan the first 1000 rows to se= e what columns there are.  Pentaho likes it if steps can expose what f= ields they will produce in advance but the columns I declare don't have to = be the full set, as long as I sample enough of the table to get a good sense of what's in there steps further a= long the pipeline should be sufficiently informed that they can be appropri= ately configured by a user

Rob

From: John Vines [john.w.vines@ugov.gov]<= br> Sent: 25 March 2012 12:40
To: accumulo-user@incubator.apache.org
Subject: RE: Determine columns in a table?

Another option for you would be to create a table for indexing your colu= mn information. That way a quick scan can give you everything.

Sent from my phone, so pardon the typos and brevity.

On Mar 25, 2012 3:06 PM, "Robert Vesse"= ; <rvesse@yarcd= ata.com> wrote:
To clarify= my question I'm not looking to see why a scan doesn't return data.  W= hat I'm wanting to know is what columns a full table scan (taking relevant = scan authorizations into account) will yield without doing the full table scan?

But it sounds like the only way is to do the table scan because in my usage= scenario users won't have HDFS access

Rob


Fro= m: John Vines [john.w.vines@ugov.gov]
Sent: 24 March 2012 16:27
To: accumulo-user@incubator.apache.org
Subject: Re: Determine columns in a table?

Individual keys have their own visibility and it is possible to have key= s with similar columns to have different visibilities.

That said, we don't track the visibilities being used, so the only way i= s the mechanism Eric suggested.

Sent from my phone, so pardon the typos and brevity.

On Mar 24, 2012 5:30 PM, "Robert Vesse"= ; <rvesse@yarcd= ata.com> wrote:
Obviously Accumulo is completely schema free but is there any easy way= given a table name and optionally one/more scan authorizations to determin= e what columns are visible to a user?

Or is the only way to do this by scanning the table?

Cheers

Rob

YarcData.com -- A Division of Cray Inc
Software Engineer, Bay Area
m: 925.960.3941  |  o: 9= 25.264.4729 | @: rvesse@yarcdata.co= m  |  Skype: rvesse
6210 Stoneridge Mall Rd  |  Suite 120  | Pleasanton CA, 9458= 8


--_000_C0B6979A3CA668458B697E4EA907CA9410A713CFWEX01americascr_--