lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Oschler <robert.osch...@gmail.com>
Subject Re: Does the latest version of lucene.net contain the Lucene 4.0 speed improvements for fuzzy queries?
Date Thu, 09 Jul 2015 09:43:36 GMT
Hello Simon,

Thank you for the detailed reply.  I really appreciate that.

Sincerely,
Robert

On Thu, Jul 9, 2015 at 5:38 AM, Simon Svensson <sisve@devhost.se> wrote:

> Hi,
>
> I believe your mail client is messing up the quotations/identations.
>
> Anyhow, the 4.0 branch does refer to Lucene 4.0, but it's an incomplete
> port. The current work is currently focused on 4.8 and is present in the
> master branch. This is expected to be the next release. I'm guessing that
> this is the branch where you found the "using Lucene.Net.Codecs.Lucene40;"
> line.
>
> I've not used Azure myself, and have only limited knowledge of the usage
> of blobs. It sounds like it comes down as locally stored indexes vs indexes
> shared over network. I just made up a few points below, from head, while
> dodging my real work tasks... ;)
>
> Locally stored indexes:
> + Easy setup.
> + Low response times.
> + One corrupted index will only bring down one searcher.
> - One index per worker (duplicated indexes == wasted disk space)
> - Every worker needs to build the index and keep it up-to-date.
> - Every worker has two roles; both searcher and indexer.
> - Slower scale-out; a new worker needs to rebuild the index.
>
> Network-based indexes:
> + One index shared between all workers.
> + Every worker has a dedicated role; either searcher or indexer (You can
> assign resources to match)
> + One dedicated worker takes care of building and keeping the index
> up-to-date.
> + Faster scale-out; a new worker just grab the data from the network.
> - Higher response times (due to network traffic). This is often mitigated
> by locally caching the segments.
> - Single-point-of-failure. A corrupted index will bring down all searchers.
>
> I would go with the Azure blobs, while it may be extra maintenance and
> documentation as a introductionary cost; you may sleep sound at night
> knowing that once your service is hit by Slashdot/Reddit you can press a
> button and scale out in a very short time. (That's the theory at least, if
> you configured your web workers correctly...)
>
> // Simon
>
>
> On 09/07/15 11:13, Robert Oschler wrote:
>
>> Hello Simon,
>>
>> Ok.  I got excited when I saw the following using statement in the 3.0.3
>> build:
>>
>> using Lucene.Net.Codecs.Lucene40;
>>
>> But from what you are saying I take it the 4.0 label does not refer to
>> Lucene 4.0.
>>
>>  You could probably take the FuzzyQuery class from Lucene 4.0 and port it
>>>>
>>> to the Lucene 3.0.3 code base. That's the only way you can get those
>> improvements while still using a stable version of Lucene.net.
>>
>> Thanks.  I'll try grabbing the FuzzyQuery class and converting the Java
>> code to C#.  Hopefully there aren't too many dependencies in that unit
>> that
>> I'll have to drag in.
>>
>>  Regarding the Azure library... are you using Azure? If so, are you using
>>>>
>>> several worker machines that need to share an index? You've not mentioned
>> anything about your current setup to help you evaluate if switching to
>> AzureDirectory will be an improvement or not.
>>
>> Yes I am using Azure, but I am just getting started so I have not set up
>> Lucene yet.  I'm trying to decide my current setup right now.  Given my
>> expected usage profile (The index will probably receive a few hundred new
>> updates over the course of the day.  Over time, there could well be a
>> hundred thousand sentences or so.), do you have any suggestions?  I'll
>> want
>> the lowest latency I can give my users that I can get when searching the
>> index.  Note, I am more concerned with the performance of index lookups.
>> Update/modifications can take a few seconds if needed since they will be
>> much less frequent.
>>
>> Thanks,
>> Robert
>>
>>
>>
>> On Thu, Jul 9, 2015 at 5:01 AM, Simon Svensson <sisve@devhost.se> wrote:
>>
>>  Hi,
>>>
>>> The latest stable version of Lucene.net is v3.0.3, and does not contain
>>> any code changes from the the higher versioned java code. There's
>>> currently
>>> a 4.8 port in progress, but it's not stable enough yet.
>>>
>>> You could probably take the FuzzyQuery class from Lucene 4.0 and port it
>>> to the Lucene 3.0.3 code base. That's the only way you can get those
>>> improvements while still using a stable version of Lucene.net.
>>>
>>> Regarding the Azure library... are you using Azure? If so, are you using
>>> several worker machines that need to share an index? You've not mentioned
>>> anything about your current setup to help you evaluate if switching to
>>> AzureDirectory will be an improvement or not.
>>>
>>> // Simon
>>>
>>>
>>>
>>> On 09/07/15 10:51, Robert Oschler wrote:
>>>
>>>  Hello,
>>>>
>>>> Does the latest version of lucene.net contain the Lucene 4.0 speed
>>>> improvements for fuzzy queries?  If not, is there any way to get those
>>>> improvements?  I saw this experimental Lucene.net 4.0 branch on
>>>> Apache.org,
>>>> but it seems to be inactive now and I don't know how stable it is:
>>>>
>>>> https://svn.apache.org/repos/asf/lucene.net/branches/Lucene.Net_4e/
>>>>
>>>> If so, what is the exact version of lucene.net I should be using to
>>>> have
>>>> those improvements?
>>>>
>>>> Also, I saw this post on using Azure blobs to speed up server side
>>>> processing:
>>>>
>>>> https://code.msdn.microsoft.com/windowsazure/Azure-Library-for-83562538
>>>>
>>>> Are the improvements using this technique substantial enough to warrant
>>>> using it?  My "documents" are average sized sentences.  The index will
>>>> probably receive a few hundred new updates over the course of the day.
>>>> Over time, there could well be a hundred thousand sentences or so.
>>>>
>>>>
>>>>
>>
>


-- 
Thanks,
Robert Oschler
Twitter -> http://twitter.com/roschler
http://www.RobotsRule.com/
http://www.Robodance.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message