Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2349A6D02 for ; Sun, 10 Jul 2011 19:52:25 +0000 (UTC) Received: (qmail 25029 invoked by uid 500); 10 Jul 2011 19:52:25 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 24941 invoked by uid 500); 10 Jul 2011 19:52:24 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 24930 invoked by uid 99); 10 Jul 2011 19:52:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Jul 2011 19:52:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Jul 2011 19:52:21 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D6DE543C0F for ; Sun, 10 Jul 2011 19:51:59 +0000 (UTC) Date: Sun, 10 Jul 2011 19:51:59 +0000 (UTC) From: "Ivan Kelly (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <504925154.1534.1310327519876.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <463470900.55765.1306838807767.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2018) Move all journal stream management code into one place MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-2018?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13062= 798#comment-13062798 ]=20 Ivan Kelly commented on HDFS-2018: ---------------------------------- {quote} in selectInputStream, it's counting both finalized and unfinalized transact= ions. But at startup, it should be recovering all of the inprogress logs to= finalized logs, right? Given that, I don't think we need the API getNumber= OfTransactions =E2=80=93 ie we only need the finalized one. {quote} We need both, there are two times which you need to count the number of tra= nsactions on a journal, startup and checkpointing. For startup you want to = consider inprogress logs. They're the result of a crash. For checkpointing,= they shouldn't be. The primary is still writing to an inprogress. With a file based journal, you cannot tell if you are starting up or checkp= ointing without some kind of write lease for the journal, which we don't ha= ve now (May be a nice thing to have in future). {quote} the API change on the StorageArchiver interface seems less than ideal =E2= =80=93 an archiver may very well want to know the txid range of a log to kn= ow what to do with it =E2=80=93 any way we can preserve this? {quote} I've put the txid range back into this API. I haven't used the FoundFSImage= and FoundEditLog interfaces though, as it would create a circular dependen= cy between StorageInspector and StorageArchiver. Also, FoundEditLog has gon= e away, so using File and longs makes it more uniform. {quote} the idea of the "remote edit log manifest" and the way we do edits transfer= is inextricably linked to the idea of log segments. But, the new JournalMa= nager APIs are based on the idea that logs are just sequences with no segme= nting. I think having both ideas coexist is fairly confusing and a good ope= ning for bugs =E2=80=93 eg right now, the JournalManagers can return Remote= EditLogs for any transaction range, but the GetImageServlet still expects f= iles. If edits are to be decoupled from files, then RemoteEditLogs should p= robably include a URI which identifies an edits transfer method. For FileJo= urnalManager, the URI would be http-based and simply point to the GetImageS= ervlet, but with BK-based logs it would point to the ZK ledger, right? {quote} Further to what I said about URIs last week, I spoke to Jitendra about this= transfer before and he said that the plan was to take this functionality o= ut of band, with rsync or something. Now that image and logs are decoupled = this is possible. > Move all journal stream management code into one place > ------------------------------------------------------ > > Key: HDFS-2018 > URL: https://issues.apache.org/jira/browse/HDFS-2018 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ivan Kelly > Assignee: Ivan Kelly > Fix For: Edit log branch (HDFS-1073) > > Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS= -2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,= HDFS-2018.diff, HDFS-2018.diff > > > Currently in the HDFS-1073 branch, the code for creating output streams i= s in FileJournalManager and the code for input streams is in the inspectors= . This change does a number of things. > - Input and Output streams are now created by the JournalManager. > - FSImageStorageInspectors now deals with URIs when referring to edit l= ogs > - Recovery of inprogress logs is performed by counting the number of tr= ansactions instead of looking at the length of the file. > The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 pat= ch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira