Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 45554 invoked from network); 10 Jul 2009 10:08:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Jul 2009 10:08:05 -0000 Received: (qmail 40965 invoked by uid 500); 10 Jul 2009 10:08:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40871 invoked by uid 500); 10 Jul 2009 10:08:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40861 invoked by uid 99); 10 Jul 2009 10:08:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jul 2009 10:08:12 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.202.165.38] (HELO smtpauth21.prod.mesa1.secureserver.net) (64.202.165.38) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 10 Jul 2009 10:08:02 +0000 Received: (qmail 12307 invoked from network); 10 Jul 2009 10:07:40 -0000 Received: from unknown (81.219.54.251) by smtpauth21.prod.mesa1.secureserver.net (64.202.165.38) with ESMTP; 10 Jul 2009 10:07:40 -0000 Message-ID: <4A5712E7.9070104@getopt.org> Date: Fri, 10 Jul 2009 12:07:35 +0200 From: Andrzej Bialecki User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: [ANN] Luke + Hadoop, alpha version Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I prepared a special edition of Luke, the Lucene Index Toolbox, that works with Lucene indexes located on any filesystem supported by Hadoop 0.19.1. At the moment I'm looking for feedback how to best integrate this functionality with various bits and pieces of Luke. You can download the jar file from a direct link: http://www.getopt.org/luke/lukeall-0.9.3.jar This JAR contains all dependencies needed to connect to HDFS, KFS or S3/S3n filesystems, although I tested it only with HDFS so far. Note: this version of Luke still uses Lucene 2.4.1, I didn't start integrating 2.9-dev yet. Quick info for the impatient: yes, you can browse the content, view terms and documents, perform searching, explaining, etc. See below for more details. The initial Open dialog is not integrated yet with this functionality. After you start Luke, you need to dismiss this dialog, go to Plugins / Hadoop Plugin, and enter the full URI of the index in the textfield, and then press the Open button. There is no filesystem browsing for now - you need to know the full URI in advance. Current functionality is as follows: - you can open a single index or partial (sharded) indexes located in part-NNNNN/ subdirectories (this is a typical layout resulting from using common map-reduce output formats). In the latter case you will get a single view of partial indexes, thanks to MultiReader. - access is read-only - most FileSystem-s don't support file updates, so it was easiest to disable write access altogether for now. - most of Luke functionality works properly, thanks to the excellent design of IndexReader API. Some operations are disabled due to read-only access, some other information (like top terms) is not populated by default due to a high IO cost, but can be requested explicitly. - the plugin keeps track of the amount of IO reads - I found this very comforting when opening large indexes over a slow VPN line ... There is a "Clear" button on the plugin's tab that resets the counters - this is useful to see how much IO is needed to complete a specific operation. - a lot of code has been reworked to avoid UI stalls when doing slow IO, which means that you can see the amount of IO being done, but the UI is blocked with a modal dialog. It's a bit unwieldy, but other solutions would require too much refactoring. Any feedback is welcome - please keep in mind that this is an early preview. Also, various UI glitches are probably related to the Thinlet toolkit - again, one day I may re-write Luke using something else, but for now I don't have the strength to do it. :) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org