Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB291DC14 for ; Fri, 5 Oct 2012 08:49:40 +0000 (UTC) Received: (qmail 63074 invoked by uid 500); 5 Oct 2012 08:49:39 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 62522 invoked by uid 500); 5 Oct 2012 08:49:32 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 62478 invoked by uid 99); 5 Oct 2012 08:49:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2012 08:49:30 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [140.203.201.101] (HELO mx2.nuigalway.ie) (140.203.201.101) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2012 08:49:22 +0000 X-IronPort-AV: E=Sophos;i="4.80,541,1344207600"; d="scan'208";a="27659462" Received: from vmserver66.nuigalway.ie (HELO vmit04.deri.ie) ([140.203.202.131]) by mx2.nuigalway.ie with ESMTP; 05 Oct 2012 09:48:59 +0100 Received: from [192.168.1.66] (84-203-75-0.mysmart.ie [84.203.75.0]) by vmit04.deri.ie (Postfix) with ESMTPSA id 5FDC1C7CDF; Fri, 5 Oct 2012 09:48:59 +0100 (IST) Message-ID: <506E9EFA.3030203@deri.org> Date: Fri, 05 Oct 2012 09:48:58 +0100 From: Renaud Delbru User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120912 Thunderbird/15.0.1 MIME-Version: 1.0 To: user@hbase.apache.org CC: Adrien Mogenet Subject: Re: Lucene instead of HFiles? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, With respect to point 3, I know there is a new codec in Lucene 4.0 for append-only filesystem such as hdfs (LUCENE-2373) Also, it would also depend on the use case. At the moment, for storing data, I would expect HFile to be much more efficient in term of compression than Lucene file system (in fact, there is no real comnpression, apart by compressing yourself the field byte stream before storing it). There is some work to try to make Lucene more efficient for small and medium sized fields (LUCENE-4226 - block-style compression and storing), but I think HFile is far more optimised for this task. In fact, another interesting idea would be to investigate the use of HFile as a StoredFieldFormat in Lucene. Efficient storage of data in Lucene is imho quite a missing feature. my2c Regards -- Renaud Delbru On 05/10/12 07:36, Adrien Mogenet wrote: > "Don't bother trying this in production" ;-) > > 1. Are you sure lookup by key are faster ? > 2. Updating Lucene files in a lock-free maneer and ensuring good > concurrency can be a bit tricky > 3. AFAIK, Lucene files don't fit in HDFS and thus another distributed > storage is required. Katta does not look as powerful as Hadoop. > > On Fri, Oct 5, 2012 at 5:34 AM, Otis Gospodnetic > wrote: >> Hi, >> >> Has anyone attempted using Lucene instead of HFiles (see >> https://twitter.com/otisg/status/254047978174701568 )? >> >> Is that a completely crazy, bad, would-never-work, >> don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or >> not? >> >> Thanks, >> Otis >> -- >> Search Analytics - http://sematext.com/search-analytics/index.html >> Performance Monitoring - http://sematext.com/spm/index.html > > >