pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "StorageFunction" by OlgaN
Date Wed, 07 Nov 2007 19:11:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/StorageFunction

New page:
[[Anchor(Load/Store_Functions)]]
== Load/Store Functions ==
Load/Store Functions are written by implementing one or both of the interfaces given below.


If the !LoadFunc interface is implemented, the function can be used to load tuples. If the
!StoreFunc interface is implemented, the function can be used to store tuples. Since loading
and storing are usually tied to each other, most functions will implement both interfaces,
e.g., !PigStorage and !BinStorage do. However, occassionally, we may write a function only
for loading.

{{{
public interface LoadFunc {
	/**
	 * This interface is used to implement functions to parse records
	 * from a dataset.
	 *
	 */
	/**
	 * Specifies a portion of an InputStream to read tuples. Because the
	 * starting and ending offsets may not be on record boundaries it is up to
	 * the implementor to deal with figuring out the actual starting and ending
	 * offsets in such a way that an arbitrarily sliced up file will be processed
	 * in its entirety.
	 * <p>
	 * A common way of handling slices in the middle of records is to start at
	 * the given offset and, if the offset is not zero, skip to the end of the
	 * first record (which may be a partial record) before reading tuples.
	 * Reading continues until a tuple has been read that ends at an offset past
	 * the ending offset.
	 *  
	 * @param fileName the name of the file to be read
	 * @param is the stream representing the file to be processed.
	 * @param offset the offset to start reading tuples.
	 * @param end the ending offset for reading.
	 * @throws IOException
	 */
	public abstract void bindTo(String fileName, InputStream is, long offset, long end) throws
IOException;

	/**
	 * Retrieves the next tuple to be processed.
	 * @return the next tuple to be processed or null if there are no more tuples
	 * to be processed.
	 * @throws IOException
	 */
	public abstract Tuple getNext() throws IOException;	
}
}}}

and

{{{

public interface StoreFunc {
	/**
	* This interface is used to implement functions to write records
	* from a dataset.
	*
	*/
	
	/**
	 * Specifies the OutputStream to write to. This will be called before
	 * store(Tuple) is invoked.
	 * 
	 * @param os The stream to write tuples to.
	 * @throws IOException
	 */
    public abstract void bindTo(OutputStream os) throws IOException;

    /**
     * Write a tuple the output stream to which this instance was
     * previously bound.
     * 
     * @param f the tuple to store.
     * @throws IOException
     */
    public abstract void putNext(Tuple f) throws IOException;

	/**
     * Do any kind of post processing because the last tuple has been
     * stored. DO NOT CLOSE THE STREAM in this method. The stream will be
     * closed later outside of this function.
     * 
     * @throws IOException
     */
    public abstract void finish() throws IOException;  
}
}}}

Mime
View raw message