accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Austin (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-143) Accumulo Hive
Date Tue, 17 Jun 2014 10:54:02 GMT


Carl Austin commented on ACCUMULO-143:

I've got a forked version of this for my testing purposes and anything that doesn't need all
columns read is slower than it needs to be (a factor of 5 times when only selecting a single
column for example in my testing), so I've modified it to only fetch the columns needed. I
can't easily create a patch due to how far I've changed things, but the necessary bit is:

In configure method:
            ASTNode node = driver.parse(conf.get("hive.query.string"));
            node = ParseUtils.findRootNonNullToken(node);
            findColumns(node, columns);
            Collection<Pair<Text, Text>> pairs = Lists.newArrayList();
            if (columns.size() > 0) {
                for (String col : columns) {
                    String[] pair = AccumuloHiveUtils.hiveToAccumulo(col, conf).split("\\|");
                    pairs.add(new Pair<Text, Text>(new Text(pair[0]), new Text(pair[1])));
            } else {
                pairs = getPairCollection(colQualFamPairs, false);

A new method:
    public void findColumns(ASTNode node, List<String> columns) {
        //TODO : This should be == HiveParser.TOK_TABLE_OR_COL not 784 but that doesn't actually
seem to work in my case. This is a hacky fix and may not work for other versions of hive.
        if (node.getToken().getType() == 784) {
        } else {
            if (node.getChildren() != null) {
                for (Node child : node.getChildren()) {
                    findColumns((ASTNode)child, columns);

Obviously this isn't perfect yet and it doesn't take into account things like count(1) which
will not return any columns so it will fetch all still.

I've also added something that allows you to configure additional columns as a serde property
when creating the table. I've done this so that columns used in iterators to calculate new
columns, may not be mapped in the create statement otherwise, not fetched and thus those "calculated"
columns will never work.

Let me know if you'd like any more info.

> Accumulo Hive
> -------------
>                 Key: ACCUMULO-143
>                 URL:
>             Project: Accumulo
>          Issue Type: Task
>          Components: contrib
>    Affects Versions: 1.6.0
>            Reporter: Keith Turner
>         Attachments: ACCUMULO-143.patch
> Need to look into adding support for Accumulo to Hive

This message was sent by Atlassian JIRA

View raw message