lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Lucene vs Glimpse
Date Mon, 04 Feb 2013 20:31:16 GMT
Generally, all of your example queries should work fine with Lucene, 
provided that you carefully choose your analyzer, or even use the 
StandardAnalyzer. The special characters like underscore and dot generally 
get treated as spaces and the resulting sequence of terms would match as a 
phrase. It won't be a 100% solution, but it should do reasonably well.

Is there a query that was failing to match reasonably for you?

-- Jack Krupansky

-----Original Message----- 
From: Mathias Dahl
Sent: Monday, February 04, 2013 1:01 PM
To: java-user@lucene.apache.org
Subject: Lucene vs Glimpse

Hi,

I have hacked together a small web front end to the Glimpse text
indexing engine (see http://webglimpse.net/ for information). I am
very happy with how Glimpse indexes and searches data. If I understand
it correctly it uses a combination of an index and searching directly
in the files themselves as grep or other tools. The problem is that I
discovered it is not open source and now that I want to extend the use
from private to company wide I will run into license problems/costs.

So, I decided to try out Lucene. I tried the examples and changed them
a bit to use another analyzer. But when I started to think about it I
realized that I will not be able to build something like Glimpse. At
least not easily.

Why? I will try to explain:

As stated above, Glimpse uses a combination of index and in-file
search. This makes it very powerful in the sense that I can get hits
for things that are not necessarily being indexes as terms. Let's say
I have a file with this content:

...
import foo.bar.baz;
...

With Glimpse, and without telling it how to index the content I can
find the above file using a search string like "foo" or "bar" but
also, and this is important, using foo.bar.baz.

Another example:

We have a lot of PL/SQL source code, and often you can find code like this:

...
My_Nice_API.Some_Method
...

Here too, Glimpse is almost magic since it combines index and normal
search. I can find the file above using "My_Nice_API" or
"My_Nice_API.Some_Method".

In a sense I can have the cake and eat it too.

If I want to do similar "free" search stuff with Lucene I think I have
to create analyzers for the different kind of source code files, with
fields for this and that. Quite an undertaking.

Does anyone understand my point here and am I correct in that it would
be hard to implement something as "free" as with Glimpse? I am not
trying to critizise, just understand how Lucene (and Glimpse) works.

Oh, yes, Glimpse has one big drawback: it only supports search strings
up to 32 characters.

Thanks!

/Mathias

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message