apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-1965) Create a WAL in Malhar
Date Wed, 20 Apr 2016 00:18:25 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249015#comment-15249015
] 

ASF GitHub Bot commented on APEXMALHAR-1965:
--------------------------------------------

Github user davidyan74 commented on a diff in the pull request:

    https://github.com/apache/incubator-apex-malhar/pull/242#discussion_r60334158
  
    --- Diff: library/src/main/java/org/apache/apex/malhar/lib/wal/FileSystemWAL.java ---
    @@ -0,0 +1,598 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.apex.malhar.lib.wal;
    +
    +import java.io.DataInputStream;
    +import java.io.DataOutputStream;
    +import java.io.IOException;
    +import java.util.EnumSet;
    +import java.util.HashSet;
    +import java.util.Iterator;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.TreeMap;
    +
    +import javax.validation.constraints.Min;
    +import javax.validation.constraints.NotNull;
    +
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import org.apache.apex.malhar.lib.utils.FileContextUtils;
    +import org.apache.apex.malhar.lib.utils.IOUtils;
    +import org.apache.apex.malhar.lib.utils.Serde;
    +import org.apache.hadoop.fs.CreateFlag;
    +import org.apache.hadoop.fs.FSDataOutputStream;
    +import org.apache.hadoop.fs.FileContext;
    +import org.apache.hadoop.fs.FileStatus;
    +import org.apache.hadoop.fs.Options;
    +import org.apache.hadoop.fs.Path;
    +import org.apache.hadoop.fs.RemoteIterator;
    +
    +import com.google.common.base.Preconditions;
    +
    +import com.datatorrent.api.annotation.Stateless;
    +
    +public class FileSystemWAL<T> implements WAL<FileSystemWAL.FileSystemWALReader,
FileSystemWAL.FileSystemWALWriter>
    +{
    +  @NotNull
    +  private Serde<T, byte[]> serde;
    +
    +  @NotNull
    +  private String filePath;
    +
    +  //max length of the file
    +  @Min(0)
    +  private long maxLength;
    +
    +  @NotNull
    +  private FileSystemWAL.FileSystemWALReader<T> fileSystemWALReader = new FileSystemWALReader<>(this);
    +
    +  @NotNull
    +  private FileSystemWAL.FileSystemWALWriter<T> fileSystemWALWriter = new FileSystemWALWriter<>(this);
    +
    +  private long lastCheckpointedWindow = Stateless.WINDOW_ID;
    +
    +  @Override
    +  public void setup()
    +  {
    +    try {
    +      FileContext fileContext = FileContextUtils.getFileContext(filePath);
    +      if (maxLength == 0) {
    +        maxLength = fileContext.getDefaultFileSystem().getServerDefaults().getBlockSize();
    --- End diff --
    
    From my understanding, a file in HDFS always takes a multiple of the block size, even
if the content of the file is just 1 byte, which is the main reason why HDFS is not suitable
for many small files. If that's true, how about instead of maxLength, we let the user specify
the multiple of file system's block size?
    
    http://blog.cloudera.com/blog/2009/02/the-small-files-problem/


> Create a WAL in Malhar
> ----------------------
>
>                 Key: APEXMALHAR-1965
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-1965
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Chandni Singh
>            Assignee: Tushar Gosavi
>
> In Malhar we have an IdempotentStorageManager which we use like a Write Ahead Logger.
There have been some other places where we have created a different flavor of Write Ahead
Logger. 
> We need to find overlap between all these flavors and create a common Write Ahead Logger
for use in Apex core and Apex malhar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message