hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-653) Make fieldsToRead work in loader
Date Fri, 06 Feb 2009 23:42:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671371#action_12671371
] 

Hong Tang commented on PIG-653:
-------------------------------

I don't like the idea of adding BAG_OF_MAP type. It really is a composite of two existing
types BAG of MAP.

Here is another idea I came up, and briefly discussed with Pradeep.

{code}
public interface Filter {
  /**
   * Return the actual type of the filter. It can then be downcast to the
   * actual Filter.
   * 
   * @return one of the following constants defined in DataType: TUPLE, BAG, and
   *         MAP
   */
  byte getType();
}

class TupleFilter implements Filter {
  private static class TupleFilterEntry {
    String alias;
    Filter filter;
    TupleFilterEntry(String a, Filter f) {
      alias = a;
      filter = f;
    }
  }
  
  SortedMap<Integer, TupleFilterEntry> entries;

  public byte getType() { return DataType.TUPLE; }
  
  public TupleFilter() {
    entries = new TreeMap<Integer, TupleFilterEntry>();
  }

  /**
   * Convenience constructor for simple positioned based filtering.
   * @param indices
   */
  public TupleFilter(int...indices) {
    entries = new TreeMap<Integer, TupleFilterEntry>();
    for (int i : indices) {
      entries.put(i, new TupleFilterEntry(null, null));
    }
  }
  
  /**
   * Adding an entry into the filter. (Building the filter.)
   * 
   * @param index
   *          The field index we are interested
   * @param alias
   *          The alias name of the field, optional
   * @param filter
   *          Further filtering on the filed, null means no more nested filter.
   */
  public synchronized void add(int index, String alias, Filter filter) {
    entries.put(index, new TupleFilterEntry(alias, filter));
  }
  
  /**
   * Get the interested fields.
   * 
   * @return The indices to the interested fields, sorted in ascending order.
   */
  public synchronized int[] getFields() {
    int[] ret = new int[entries.size()];
    int i = 0;
    for (Iterator<Integer> it = entries.keySet().iterator(); it.hasNext(); ++i) {
      ret[i] = it.next();
    }
    return ret;
  }

  public synchronized String getAlias(int index) {
    TupleFilterEntry entry = entries.get(index);
    if (entry == null) {
      throw new IllegalArgumentException("Unrecognized field index");
    }
    return entry.alias;
  }

  public synchronized Filter getFilter(int index) {
    TupleFilterEntry entry = entries.get(index);
    if (entry == null) {
      throw new IllegalArgumentException("Unrecognized field index");
    }
    return entry.filter;
  }
}

class MapFilter implements Filter {
  Map<String, Filter> entries;
  
  public MapFilter() {
    entries = new TreeMap<String, Filter>();
  }
  
  /**
   * Convenience constructor for simple key matching filtering.
   * 
   * @param keys
   *          interested keys
   */
  public MapFilter(String... keys) {
    this();
    add(keys);
  }
  
  /**
   * Adding keys to the interested key set without further filteriing.
   * 
   * @param keys
   *          interested keys.
   */
  public void add(String... keys) {
    add(null, keys);
  }

  /**
   * Adding keys to the interested key set with further filtering
   * 
   * @param f
   *          The filter
   * @param keys
   *          the keys
   */
  public synchronized void add(Filter f, String... keys) {
    for (String k : keys) {
      entries.put(k, f);
    }
  }
  
  @Override
  public byte getType() {
    return DataType.MAP;
  }
  
  public synchronized Map<String, Filter> getKeyFilterMapping() {
    return entries;
  }
}

class BagFilter implements Filter {
  Filter filter;

  public BagFilter(TupleFilter filter) {
    this.filter = filter;
  }

  @Override
  public byte getType() {
    return DataType.BAG;
  }

  public Filter getTupleFilter() {
    return filter;
  }
}
{code}

> Make fieldsToRead work in loader
> --------------------------------
>
>                 Key: PIG-653
>                 URL: https://issues.apache.org/jira/browse/PIG-653
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>
> Currently pig does not call the fieldsToRead function in LoadFunc, thus it does not provide
information to load functions on what fields are needed.  We need to implement a visitor that
determines (where possible) which fields in a file will be used and relays that information
to the load function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message