accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip A Grim II (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-1804) Integrate RStudio to work with data residing in Accumulo
Date Wed, 30 Oct 2013 00:24:26 GMT


Philip A Grim II commented on ACCUMULO-1804:


I'll get a README posted to the GitHub repo this week.

Basically, there are three parts, assuming the things you've already done.  First, you have
to have an Accumulo proxy running on the Accumulo instance you want to talk to, which you
can do by following the instructions in $ACCUMULO_HOME/proxy.

Second, you have to have Thrift installed on the box R is running on.  I don't recall if there's
an apt package for it - I just build it from source.  Building it from source has the advantage
of ensuring that you have all of the prerequisites for performing the third step...

Third, you have to install raccumulo into R.  In a command shell, cd to the directory where
you unpacked the tarball (not the raccumulo directory itself, but the parent) and type the
following command:

R CMD INSTALL raccumulo

This will cause R to configure, build, and install the package into your R library.  The most
common reason for it not to successfully build is that your PKG_CONFIG_LIB path doesn't include
the Thrift installation.

Assuming it builds and installs, and you have the proxy running on your Accumulo instance,
you should be able to talk to Accumulo from R.  There is a file in the noinst directory under
raccumulo called test.R that shows examples of how to connect and use the functionality. 
There are also R man pages you can get at from the RStudio help tab, or from the R prompt
by typing ?raccumulo.

As I said, sometime this week, I intend to have this all written up formally and in great
detail and posted in the raccumulo GitHub repo.


> Integrate RStudio to work with data residing in Accumulo
> --------------------------------------------------------
>                 Key: ACCUMULO-1804
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Aaron Glahe
>            Priority: Minor
>         Attachments: raccumulo-release.tar.gz
> Need to be able to support users who utilize RStudio to conduct analysis of data residing
in the Accumulo data space instead of moving data from one repository to a stand alone system
to have the analytic run in memory.  RStudio should be able to make calls directly to the
data space and provide the output within the RStudio interface.

This message was sent by Atlassian JIRA

View raw message