lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Lucene vs Glimpse
Date Wed, 06 Feb 2013 07:33:37 GMT
Here's another thought: if you desperately need complex searches then
you could do a heuristic filtering to narrow down the search: use an
analyzer that does some form of input splitting into terms (removing
excess whitespace or even producing n-grams from the input), then do
the same for the query, search and locate *approximately* matching
documents and (as a post-processing step) perform an exact grep much
like Glimpse does (I don't know the tool, but I'm assuming this from
your description).

This may be a heavy operation compared to a simple inverted index
lookup but then you'll get exact matches for those complex queries
involving punctuation characters, etc. (and possibly matches on
partially matching terms). If your query load on the server is never
too high it may be a possibility to consider.

Dawid

On Tue, Feb 5, 2013 at 11:37 PM, Mathias Dahl <mathias.dahl@gmail.com> wrote:
> Thanks for the input! Seems I should give this another chance using
> the hints you all sent me. I'll report back my findings here.
>
> /Mathias
>
>
> On Mon, Feb 4, 2013 at 7:01 PM, Mathias Dahl <mathias.dahl@gmail.com> wrote:
>> Hi,
>>
>> I have hacked together a small web front end to the Glimpse text
>> indexing engine (see http://webglimpse.net/ for information). I am
>> very happy with how Glimpse indexes and searches data. If I understand
>> it correctly it uses a combination of an index and searching directly
>> in the files themselves as grep or other tools. The problem is that I
>> discovered it is not open source and now that I want to extend the use
>> from private to company wide I will run into license problems/costs.
>>
>> So, I decided to try out Lucene. I tried the examples and changed them
>> a bit to use another analyzer. But when I started to think about it I
>> realized that I will not be able to build something like Glimpse. At
>> least not easily.
>>
>> Why? I will try to explain:
>>
>> As stated above, Glimpse uses a combination of index and in-file
>> search. This makes it very powerful in the sense that I can get hits
>> for things that are not necessarily being indexes as terms. Let's say
>> I have a file with this content:
>>
>> ...
>> import foo.bar.baz;
>> ...
>>
>> With Glimpse, and without telling it how to index the content I can
>> find the above file using a search string like "foo" or "bar" but
>> also, and this is important, using foo.bar.baz.
>>
>> Another example:
>>
>> We have a lot of PL/SQL source code, and often you can find code like this:
>>
>> ...
>> My_Nice_API.Some_Method
>> ...
>>
>> Here too, Glimpse is almost magic since it combines index and normal
>> search. I can find the file above using "My_Nice_API" or
>> "My_Nice_API.Some_Method".
>>
>> In a sense I can have the cake and eat it too.
>>
>> If I want to do similar "free" search stuff with Lucene I think I have
>> to create analyzers for the different kind of source code files, with
>> fields for this and that. Quite an undertaking.
>>
>> Does anyone understand my point here and am I correct in that it would
>> be hard to implement something as "free" as with Glimpse? I am not
>> trying to critizise, just understand how Lucene (and Glimpse) works.
>>
>> Oh, yes, Glimpse has one big drawback: it only supports search strings
>> up to 32 characters.
>>
>> Thanks!
>>
>> /Mathias
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message