Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B4D4EE31 for ; Mon, 4 Feb 2013 18:02:22 +0000 (UTC) Received: (qmail 49421 invoked by uid 500); 4 Feb 2013 18:02:20 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 49356 invoked by uid 500); 4 Feb 2013 18:02:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49348 invoked by uid 99); 4 Feb 2013 18:02:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2013 18:02:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mathias.dahl@gmail.com designates 74.125.82.177 as permitted sender) Received: from [74.125.82.177] (HELO mail-we0-f177.google.com) (74.125.82.177) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2013 18:02:12 +0000 Received: by mail-we0-f177.google.com with SMTP id d7so5123593wer.22 for ; Mon, 04 Feb 2013 10:01:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:from:date:message-id:subject:to :content-type; bh=dzl6f/Q51cYeWIlE1L5hCNd40a30hiAIoJAATIZwn9I=; b=BgU8NKGX6ZRt560eptWzxUF91VFNsbC8eMAcXv7ICB1uac/WAzBCu3gBVkG37U83RS qTG8m53qKiYLg5bS5Q++JzAO68rbQboxaJcU0V9oHA8ZOJO4FnL4RBV1S+nAw1TeBVMy ktA5tx/u25jVY2cVW7o0km6l6zE8I3Bb5cuDNeZO0b4yycf8YJHUzmZCAcBzYbfQemzx jG9OYmmoWm9uvlFrTwLNWutEp4TmUqtC67+23/pLAAwBZ0apzKCTFVtOaZOMHnvD5ggb 6R9OUj3QhP5KDxe5jdU/DeFGhY82l/PQvSQGz3SUdfWb6WTqJQQzvjZl3S7aDF2Xoozb jhwQ== X-Received: by 10.194.174.234 with SMTP id bv10mr37134568wjc.47.1360000912503; Mon, 04 Feb 2013 10:01:52 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.51.39 with HTTP; Mon, 4 Feb 2013 10:01:32 -0800 (PST) From: Mathias Dahl Date: Mon, 4 Feb 2013 19:01:32 +0100 Message-ID: Subject: Lucene vs Glimpse To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I have hacked together a small web front end to the Glimpse text indexing engine (see http://webglimpse.net/ for information). I am very happy with how Glimpse indexes and searches data. If I understand it correctly it uses a combination of an index and searching directly in the files themselves as grep or other tools. The problem is that I discovered it is not open source and now that I want to extend the use from private to company wide I will run into license problems/costs. So, I decided to try out Lucene. I tried the examples and changed them a bit to use another analyzer. But when I started to think about it I realized that I will not be able to build something like Glimpse. At least not easily. Why? I will try to explain: As stated above, Glimpse uses a combination of index and in-file search. This makes it very powerful in the sense that I can get hits for things that are not necessarily being indexes as terms. Let's say I have a file with this content: ... import foo.bar.baz; ... With Glimpse, and without telling it how to index the content I can find the above file using a search string like "foo" or "bar" but also, and this is important, using foo.bar.baz. Another example: We have a lot of PL/SQL source code, and often you can find code like this: ... My_Nice_API.Some_Method ... Here too, Glimpse is almost magic since it combines index and normal search. I can find the file above using "My_Nice_API" or "My_Nice_API.Some_Method". In a sense I can have the cake and eat it too. If I want to do similar "free" search stuff with Lucene I think I have to create analyzers for the different kind of source code files, with fields for this and that. Quite an undertaking. Does anyone understand my point here and am I correct in that it would be hard to implement something as "free" as with Glimpse? I am not trying to critizise, just understand how Lucene (and Glimpse) works. Oh, yes, Glimpse has one big drawback: it only supports search strings up to 32 characters. Thanks! /Mathias --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org