Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B483172DA for ; Fri, 24 Oct 2014 19:26:35 +0000 (UTC) Received: (qmail 4523 invoked by uid 500); 24 Oct 2014 19:26:35 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 4466 invoked by uid 500); 24 Oct 2014 19:26:35 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 4454 invoked by uid 99); 24 Oct 2014 19:26:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Oct 2014 19:26:34 +0000 Date: Fri, 24 Oct 2014 19:26:34 +0000 (UTC) From: "Sean Busbey (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12324) Improve compaction speed and process for immutable short lived datasets MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183342#comment-14183342 ] Sean Busbey commented on HBASE-12324: ------------------------------------- {quote} The only issue I see with TS is if old data come late. But in those cases, the data will get deleted later which seems same as running major compaction late. {quote} It's actually worse than that, because the clock could adjust and we could have a file timestamp that is older than the cell timestamps within it. That would result in deleting data that isn't yet expired. (presuming the timestamp will be set based on when the server calls close()) {quote} Do you mean to say that every file will have latest timestamp of any cell in it. And we could use that TS to identify files to delete instead of looking at file timestamp ? That sounds interesting. {quote} Yes exactly, we use protobufs and have a bunch of padded space in the fixed trailer so that we can make optimizations without having to increment the file version. We already track some other cell stats as we make a file, seems like adding the info about the timestamps inside the file should be straight forward. > Improve compaction speed and process for immutable short lived datasets > ----------------------------------------------------------------------- > > Key: HBASE-12324 > URL: https://issues.apache.org/jira/browse/HBASE-12324 > Project: HBase > Issue Type: New Feature > Components: Compaction > Affects Versions: 0.98.0, 0.96.0 > Reporter: Sheetal Dolas > Attachments: OnlyDeleteExpiredFilesCompactionPolicy.java > > > We have seen multiple cases where HBase is used to store immutable data and the data lives for short period of time (few days) > On very high volume systems, major compactions become very costly and slowdown ingestion rates. > In all such use cases (immutable data, high write rate and moderate read rates and shorter ttl), avoiding any compactions and just deleting old data brings lot of performance benefits. > We should have a compaction policy that can only delete/archive files older than TTL and not compact any files. > Also attaching a patch that can do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)