Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E54D218D81 for ; Mon, 19 Oct 2015 16:35:39 +0000 (UTC) Received: (qmail 39412 invoked by uid 500); 19 Oct 2015 16:35:05 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 39276 invoked by uid 500); 19 Oct 2015 16:35:05 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 39248 invoked by uid 99); 19 Oct 2015 16:35:05 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Oct 2015 16:35:05 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 121FB2C1F65 for ; Mon, 19 Oct 2015 16:35:05 +0000 (UTC) Date: Mon, 19 Oct 2015 16:35:05 +0000 (UTC) From: "Aihua Xu (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-1898) The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963578#comment-14963578 ] Aihua Xu commented on HIVE-1898: -------------------------------- HIVE-11785 added the support of escaping the newline and carriage return for LazySimpleSerDe and it should fix this issue. So the intermediate result with LazySimpleSerDe will escape newline and carriage return and later LineRecordReader can handle each line properly. > The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed > ----------------------------------------------------------------------------------------------------------- > > Key: HIVE-1898 > URL: https://issues.apache.org/jira/browse/HIVE-1898 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.5.0 > Reporter: Josh Patterson > Priority: Minor > > If I want to preserve data in columns which contains a newline (webcrawling for instance) I cannot set the ESCAPED BY clause to escape these out (other characters such as commas escape fine, however). This may be due to the line terminators, which are locked to be newlines, are picked up first, and then fields processed. > This seems to be related to: > "SerDe should escape some special characters" > https://issues.apache.org/jira/browse/HIVE-136 > and > "Implement "LINES TERMINATED BY"" > https://issues.apache.org/jira/browse/HIVE-302 > where at comment: https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435 > "This is not fixable currently because the line terminator is determined by LineRecordReader.LineReader which is in the Hadoop land." -- This message was sent by Atlassian JIRA (v6.3.4#6332)