flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-933) Add an input format to read primitive types directly (not through tuples)
Date Mon, 07 Jul 2014 16:46:34 GMT

    [ https://issues.apache.org/jira/browse/FLINK-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053836#comment-14053836
] 

ASF GitHub Bot commented on FLINK-933:
--------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/47#discussion_r14607144
  
    --- Diff: stratosphere-java/src/main/java/eu/stratosphere/api/java/io/PrimitiveInputFormat.java
---
    @@ -0,0 +1,73 @@
    +/***********************************************************************************************************************
    + *
    + * Copyright (C) 2010-2013 by the Stratosphere project (http://stratosphere.eu)
    + *
    + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
file except in compliance with
    + * the License. You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software distributed under
the License is distributed on
    + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the
    + * specific language governing permissions and limitations under the License.
    + *
    + **********************************************************************************************************************/
    +package eu.stratosphere.api.java.io;
    +
    +import eu.stratosphere.api.common.io.DelimitedInputFormat;
    +import eu.stratosphere.core.fs.FileInputSplit;
    +import eu.stratosphere.core.fs.Path;
    +import eu.stratosphere.types.parser.FieldParser;
    +import eu.stratosphere.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +
    +/**
    + * An input format that reads single field primitive data from a given file. The difference
between this and
    + * {@link eu.stratosphere.api.java.io.CsvInputFormat} is that it won't go through {@link
eu.stratosphere.api.java.tuple.Tuple1}.
    + */
    +public class PrimitiveInputFormat<OT> extends DelimitedInputFormat<OT> {
    +
    +	private Class<OT> primitiveClass;
    +
    +	private static final byte CARRIAGE_RETURN = (byte) '\r';
    +
    +	private static final byte NEW_LINE = (byte) '\n';
    +
    +	private transient FieldParser<OT> parser;
    +
    +
    +	public PrimitiveInputFormat(Path filePath, Class<OT> primitiveClass) {
    +		super(filePath);
    +		this.primitiveClass = primitiveClass;
    +	}
    +
    +	public PrimitiveInputFormat(Path filePath, char delimiter, Class<OT> primitiveClass)
{
    +		super(filePath);
    +		this.primitiveClass = primitiveClass;
    +		this.setDelimiter(delimiter);
    +	}
    +
    +	@Override
    +	public void open(FileInputSplit split) throws IOException {
    +		super.open(split);
    +		Class<? extends FieldParser<OT>> parserType = FieldParser.getParserForType(primitiveClass);
    +		if (parserType == null) {
    +			throw new IllegalArgumentException("The type '" + primitiveClass.getName() + "' is
not supported for the primitive input format.");
    +		}
    +		parser = InstantiationUtil.instantiate(parserType, FieldParser.class);
    +	}
    +
    +	@Override
    +	public OT readRecord(OT reuse, byte[] bytes, int offset, int numBytes) {
    +		//Check if \n is used as delimiter and the end of this line is a \r, then remove \r
from the line
    +		if (this.getDelimiter() != null && this.getDelimiter().length == 1
    +			&& this.getDelimiter()[0] == NEW_LINE && offset+numBytes >= 1
    +			&& bytes[offset+numBytes-1] == CARRIAGE_RETURN){
    +			numBytes -= 1;
    +		}
    +
    +		parser.parseField(bytes, offset, numBytes + offset, (char) this.getDelimiter()[0],
reuse);
    --- End diff --
    
    If you check in line 64 whether this.getDelimiter() is not null, I think we should check
it here as well. But actually, the DelimitedInputFormat checks upon setting the delimiter
whether it is null. So it should be safe to leave the null check in line 64 out.


> Add an input format to read primitive types directly (not through tuples)
> -------------------------------------------------------------------------
>
>                 Key: FLINK-933
>                 URL: https://issues.apache.org/jira/browse/FLINK-933
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Stephan Ewen
>            Assignee: Mingliang Qi
>            Priority: Minor
>              Labels: easyfix, features, starter
>
> Right now, reading primitive types goes either through custom formats (work intensive),
or through CSV inputs. The latter return tuples.
> To read a sequence of primitives, you need to go though Tuple1, which is clumsy.
> I would suggest to add an input format to read primitive types line wise (or otherwise
delimited), and also add a method to the environment for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message