lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Is my app a good fit for Lucene?
Date Fri, 10 Jul 2009 14:55:11 GMT
It would be helpful if you told us what analyzers you're using andwhat your
search code looks like. Even better would be a small,
self-contained demonstration app showing the issue.

You could well be right that the text format is tripping up tokenizing,
but there are other issues. You may have to pre-process the text or
write your own tokenizer to handle this.

The user frequently updating the files is a bit of a pain, although there
are various schemes that can be used to make this easier. The
basic issue is that unless you reopen the underlying reader, you
won't see updates. The reader only sees a snapshot of what was
in the index when it was opened.

Comining 2.9 (I believe), is a near-realtime capability, you can see
a discussing of this on the dev list. You can see much of it here:
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696438#action_12696438

HTH
Erick

On Fri, Jul 10, 2009 at 10:35 AM, Andy Faibishenko <andy@lasalletech.com>wrote:

> I have a GUI application which needs to open large files (hundreds of MB)
> and be able to search through them quickly for user specified strings.
> These files are frequently updated while the user is viewing them and the
> updates are captured by the application.  Also, the files contain records
> which are KEY=VALUE pairs separated by a non-printable ASCII character
> instead of normal English text.
>
> I installed Lucene in Eclipse and tried to play around with some sample
> code.  One thing I noticed is that the wildcard searching doesn't seem to
> work right on this data.  I am guessing it is because the text format is
> tripping up the tokenizing.
>
> I am trying to figure out whether using Lucene to implement this is a good
> thing or whether I should just try to implement my own search logic.
>
> Andy Faibishenko
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message