<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>c-dev@lucene.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/"/>
<id>http://mail-archives.apache.org/mod_mbox/lucene-c-dev/</id>
<updated>2009-12-07T18:29:08Z</updated>
<entry>
<title>Optimization and Corruption Issues</title>
<author><name>lowfreq &lt;hughmorrison@hotmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200910.mbox/%3c25696990.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c25696990-post@talk-nabble-com%3e</id>
<updated>2009-10-01T15:39:22Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

I have a Lucene index that is very large in size.
It was created using a pre 2.1 version of Lucene.net 2.0.0.4.

The index is currently almost 20 GB, and has almost 7000 segment files.
The problem I am having is that I need to optimize it, and cant do this
without the search functionality of my app being down for a week.

I used the Luke tool from getopt.org and it worked flawlessly, optimizing
the index in just over 2 hours. Problem is that my search cannot use it, and
the error states Unknown Format Version errors, or just plain nothing found.

I understand that versions of Lucene that are newer than what the index was
built and is searched with can cause problems.

What can I do to make this work? I have tried older versions of Luke, 0.7
was the oldest I could lay hands on, but even it uses a newer version of
Lucene.

My index version shows as 633103800023469045. The version the index is
written as after optimizing with Luke 7.0 is 633103800023469057.

Any help here would be awesome!

Thank you,

Hugh
-- 
View this message in context: http://www.nabble.com/Optimization-and-Corruption-Issues-tp25696990p25696990.html
Sent from the Lucene - C Developer mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>How to write the Sql in(intersection) query in Lucene.</title>
<author><name>gopalbisht &lt;bisht_gopal2004@rediffmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200905.mbox/%3c23403965.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c23403965-post@talk-nabble-com%3e</id>
<updated>2009-05-06T10:29:24Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Hello friend i need your help,

How to write the Sql in query in Lucene.
Example
Sql Query:
String tagIdStr="x,y,z";
List results=fSession.createQuery("from com.test.x.manager.TagTest where id
in ("+tagIdStr+")").list();

I have tried to write this query in Lucene but the list result is showing 0.
Lucene Query:
List results =fSession.createFullTextQuery(new WildcardQuery(new
Term("id","*"+tagIdStr+"*" )), TagTest.class).list();

Please suggested me how i can implemented this.

Thanks in Advance

Gopal Bisht


-- 
View this message in context: http://www.nabble.com/How-to-write-the-Sql-in%28intersection%29-query-in-Lucene.-tp23403965p23403965.html
Sent from the Lucene - C Developer mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>is the index file support simultaneous query accesses</title>
<author><name>cy163 &lt;cy163@hotmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200812.mbox/%3c21156872.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c21156872-post@talk-nabble-com%3e</id>
<updated>2008-12-24T14:53:48Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Hello ALL,

I am new to Clucene. Clucene store the index info of documents in files
rather than a database. So, I wonder if the index files support
simmultaneous accesses from  multiple queries occurring at the same time.

Thanks

Felix
-- 
View this message in context: http://www.nabble.com/is-the-index-file-support-simultaneous-query-accesses-tp21156872p21156872.html
Sent from the Lucene - C Developer mailing list archive at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>using date-recency criteria within Lucene</title>
<author><name>Felix Litman &lt;f_litman@pacbell.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200701.mbox/%3c20070118220003.18651.qmail@web82508.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20070118220003-18651-qmail@web82508-mail-mud-yahoo-com%3e</id>
<updated>2007-01-18T22:00:02Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
We want to be able to apply a date-score criteria  within Lucene so that more-recent documents
score higher.
   
  We also want to be able to apply some boost factor to this date-field score.
   
  Right now we use Lucense to score the documents based on search citeria and then apply our
own "recency-of-date+bost factor factor" formula.   We thus generate a combined score which
we then use to display results that now incorporate the date criteria.
   
  But we prefer to do this completely within Lucene.   Is there a way to do this? What is
the best way? Which functiojn, field would we use?
   
  Thank you,
  Felix


</pre>
</div>
</content>
</entry>
<entry>
<title>[RESULT] Mark lucene4c as dormant (was Re: [VOTE] Mark lucene4c as dormant)</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3c7edfeeef0610181126y62b6bd05pb77ab5246bfbdc8b@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0610181126y62b6bd05pb77ab5246bfbdc8b@mail-gmail-com%3e</id>
<updated>2006-10-18T18:26:50Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 10/9/06, Garrett Rooney &lt;rooneg@electricjellyfish.net&gt; wrote:
&gt; Since there hasn't been any significant work on Lucene4c in quite some
&gt; time, I'd like to officially mark it as dormant so I don't have to
&gt; keep writing board reports that just say "there has been no progress
&gt; in the last 3 months".
&gt;
&gt; So, cast your votes now:
&gt;
&gt;   [ ] +1 - Mark Lucene4c as dormant.
&gt;   [ ]  0 - I have no opinion.
&gt;   [ ] -1 - No, please keep it!  [include reason]

FYI, we had 8 +1 votes, and no objections, so I've begun the process
of shutting the project down.

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: [VOTE] Mark lucene4c as dormant</title>
<author><name>&quot;Yonik Seeley&quot; &lt;yonik@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3cc68e39170610100809t2fce09e1pf8503decd481a096@mail.gmail.com%3e"/>
<id>urn:uuid:%3cc68e39170610100809t2fce09e1pf8503decd481a096@mail-gmail-com%3e</id>
<updated>2006-10-10T15:09:07Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
+1

-Yonik

On 10/9/06, Garrett Rooney &lt;rooneg@electricjellyfish.net&gt; wrote:
&gt; Since there hasn't been any significant work on Lucene4c in quite some
&gt; time, I'd like to officially mark it as dormant so I don't have to
&gt; keep writing board reports that just say "there has been no progress
&gt; in the last 3 months".
&gt;
&gt; So, cast your votes now:
&gt;
&gt;   [ ] +1 - Mark Lucene4c as dormant.
&gt;   [ ]  0 - I have no opinion.
&gt;   [ ] -1 - No, please keep it!  [include reason]
&gt;
&gt; -garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: [VOTE] Mark lucene4c as dormant</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3c7edfeeef0610100743t464dd749q42a48d007194f002@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0610100743t464dd749q42a48d007194f002@mail-gmail-com%3e</id>
<updated>2006-10-10T14:43:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 10/10/06, Leo Simons &lt;mail@leosimons.com&gt; wrote:
&gt; On Oct 9, 2006, at 5:34 PM, Garrett Rooney wrote:
&gt; &gt; Since there hasn't been any significant work on Lucene4c in quite some
&gt; &gt; time, I'd like to officially mark it as dormant so I don't have to
&gt; &gt; keep writing board reports that just say "there has been no progress
&gt; &gt; in the last 3 months".
&gt;
&gt; does "I" here means "lucene4c community"?
&gt;
&gt; &gt; So, cast your votes now:
&gt; &gt;
&gt; &gt;  [ ] +1 - Mark Lucene4c as dormant.
&gt; &gt;  [ ]  0 - I have no opinion.
&gt; &gt;  [ ] -1 - No, please keep it!  [include reason]
&gt;
&gt; +1, assuming "the lucene4c people" (whoever that are) are happy with
&gt; this too.

The "lucene4c people" basically means me, and Paul to some extent,
although he never realy got around to doing anything ;-)

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: [VOTE] Mark lucene4c as dormant</title>
<author><name>Leo Simons &lt;mail@leosimons.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3c64348F79-8C22-4298-A375-10DB7991466E@leosimons.com%3e"/>
<id>urn:uuid:%3c64348F79-8C22-4298-A375-10DB7991466E@leosimons-com%3e</id>
<updated>2006-10-10T14:28:42Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On Oct 9, 2006, at 5:34 PM, Garrett Rooney wrote:
&gt; Since there hasn't been any significant work on Lucene4c in quite some
&gt; time, I'd like to officially mark it as dormant so I don't have to
&gt; keep writing board reports that just say "there has been no progress
&gt; in the last 3 months".

does "I" here means "lucene4c community"?

&gt; So, cast your votes now:
&gt;
&gt;  [ ] +1 - Mark Lucene4c as dormant.
&gt;  [ ]  0 - I have no opinion.
&gt;  [ ] -1 - No, please keep it!  [include reason]

+1, assuming "the lucene4c people" (whoever that are) are happy with  
this too.

LSD



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: [VOTE] Mark lucene4c as dormant</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3c7edfeeef0610090917y488171d5mfb9df13c7c597e51@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0610090917y488171d5mfb9df13c7c597e51@mail-gmail-com%3e</id>
<updated>2006-10-09T16:17:37Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 10/9/06, Yoav Shapira &lt;yoavs@apache.org&gt; wrote:
&gt; Hi,
&gt;
&gt; On 10/9/06, Garrett Rooney &lt;rooneg@electricjellyfish.net&gt; wrote:
&gt; &gt;   [ X ] +1 - Mark Lucene4c as dormant.
&gt; &gt;   [ ]  0 - I have no opinion.
&gt; &gt;   [ ] -1 - No, please keep it!  [include reason]
&gt;
&gt; Will people have a post-ApacheCon todo / done list as well?  I saw
&gt; this item on your preconference todo list ;)

I was thinking of posting a "here's what I actually got done" list ;-)

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: [VOTE] Mark lucene4c as dormant</title>
<author><name>&quot;Yoav Shapira&quot; &lt;yoavs@apache.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3cdb94f1a0610090911u33f2eeb1mf844773e5dbc63cd@mail.gmail.com%3e"/>
<id>urn:uuid:%3cdb94f1a0610090911u33f2eeb1mf844773e5dbc63cd@mail-gmail-com%3e</id>
<updated>2006-10-09T16:11:34Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,

On 10/9/06, Garrett Rooney &lt;rooneg@electricjellyfish.net&gt; wrote:
&gt;   [ X ] +1 - Mark Lucene4c as dormant.
&gt;   [ ]  0 - I have no opinion.
&gt;   [ ] -1 - No, please keep it!  [include reason]

Will people have a post-ApacheCon todo / done list as well?  I saw
this item on your preconference todo list ;)

Yoav


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: [VOTE] Mark lucene4c as dormant</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3c7edfeeef0610090834n2e69a769k4927a407eb5a56ef@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0610090834n2e69a769k4927a407eb5a56ef@mail-gmail-com%3e</id>
<updated>2006-10-09T15:34:55Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 10/9/06, Garrett Rooney &lt;rooneg@electricjellyfish.net&gt; wrote:
&gt; Since there hasn't been any significant work on Lucene4c in quite some
&gt; time, I'd like to officially mark it as dormant so I don't have to
&gt; keep writing board reports that just say "there has been no progress
&gt; in the last 3 months".
&gt;
&gt; So, cast your votes now:
&gt;
&gt;   [ ] +1 - Mark Lucene4c as dormant.
&gt;   [ ]  0 - I have no opinion.
&gt;   [ ] -1 - No, please keep it!  [include reason]

+1 from me, FWIW.

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>[VOTE] Mark lucene4c as dormant</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200610.mbox/%3c7edfeeef0610090834g761fc368i5a88247b0dacb86a@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0610090834g761fc368i5a88247b0dacb86a@mail-gmail-com%3e</id>
<updated>2006-10-09T15:34:23Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Since there hasn't been any significant work on Lucene4c in quite some
time, I'd like to officially mark it as dormant so I don't have to
keep writing board reports that just say "there has been no progress
in the last 3 months".

So, cast your votes now:

  [ ] +1 - Mark Lucene4c as dormant.
  [ ]  0 - I have no opinion.
  [ ] -1 - No, please keep it!  [include reason]

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>relevance in Lucene</title>
<author><name>&quot;sudarshan angirash&quot; &lt;angirash.tech@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200607.mbox/%3cb5544d420607130447x17f1cda6nc68a84d827741aa@mail.gmail.com%3e"/>
<id>urn:uuid:%3cb5544d420607130447x17f1cda6nc68a84d827741aa@mail-gmail-com%3e</id>
<updated>2006-07-13T11:47:51Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
hi Friends

i would like to know that on what behalf Lucene defines "Relevance" in
search results.
if anyone worked on Relevance parameter from Lucene, then please do help
me.................

regards,
Sudarshan


</pre>
</div>
</content>
</entry>
<entry>
<title>query for search through lucene for BLOB</title>
<author><name>&quot;sudarshan angirash&quot; &lt;angirash.tech@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200607.mbox/%3cb5544d420607120437v78b22383t56dfc0305d9cfd6e@mail.gmail.com%3e"/>
<id>urn:uuid:%3cb5544d420607120437v78b22383t56dfc0305d9cfd6e@mail-gmail-com%3e</id>
<updated>2006-07-12T11:37:07Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
hi all

i have some PDF files stored in Oracle 9i as BLOB.
now i want to search for a string in those pdf files using Lucene. then i
want to show the selected PDF files which contains The String.

if you can give me any pointers about how to do it, then it will be a gr8
help for me.

regards
sudarshan


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Lucene Dynamic http Web Page Search</title>
<author><name>&quot;Tim Archambault&quot; &lt;tarchambault@bangordailynews.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200606.mbox/%3cf1d57b6b0606301003t7571a13aj5e1f3a497c207faf@mail.gmail.com%3e"/>
<id>urn:uuid:%3cf1d57b6b0606301003t7571a13aj5e1f3a497c207faf@mail-gmail-com%3e</id>
<updated>2006-06-30T17:03:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You may want to check this out as well:
http://incubator.apache.org/solr/




On 6/29/06, Clive. &lt;clive.lancaster@bestwestern.co.uk&gt; wrote:
&gt;
&gt;
&gt; Thanks for that I have managed to fined Lucene4DB
&gt; http://www.netomatix.com/Products/DocumentManagement/Lucene4DB.aspx that
&gt; may
&gt; do the job.
&gt; --
&gt; View this message in context:
&gt; http://www.nabble.com/Lucene-Dynamic-http-Web-Page-Search-tf1867982.html#a5105584
&gt; Sent from the Lucene - C Developer forum at Nabble.com.
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Lucene Dynamic http Web Page Search</title>
<author><name>&quot;Clive.&quot; &lt;clive.lancaster@bestwestern.co.uk&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200606.mbox/%3c5105584.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c5105584-post@talk-nabble-com%3e</id>
<updated>2006-06-29T16:11:29Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Thanks for that I have managed to fined Lucene4DB
http://www.netomatix.com/Products/DocumentManagement/Lucene4DB.aspx that may
do the job.
-- 
View this message in context: http://www.nabble.com/Lucene-Dynamic-http-Web-Page-Search-tf1867982.html#a5105584
Sent from the Lucene - C Developer forum at Nabble.com.



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Lucene Dynamic http Web Page Search</title>
<author><name>&quot;Tim Archambault&quot; &lt;tarchambault@bangordailynews.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200606.mbox/%3cf1d57b6b0606290858k70682368wb99ead6543d63827@mail.gmail.com%3e"/>
<id>urn:uuid:%3cf1d57b6b0606290858k70682368wb99ead6543d63827@mail-gmail-com%3e</id>
<updated>2006-06-29T15:58:19Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Yes,

I inject database files into a Lucene index every day using Cold Fusion. If
it can be done with CF it can be done with .NET.


On 6/29/06, Clive. &lt;clive.lancaster@bestwestern.co.uk&gt; wrote:
&gt;
&gt;
&gt; Hi,
&gt;
&gt; I am working on adding a search feature to a web site that uses single
&gt; database driven aspx pages and would like to know if Lucene can search
&gt; using
&gt; the http url address or database to index from.
&gt; As current I can only see Lucene being able to search physical files in a
&gt; windows folder.
&gt;
&gt; Any ideas?
&gt;
&gt; --
&gt; View this message in context:
&gt; http://www.nabble.com/Lucene-Dynamic-http-Web-Page-Search-tf1867982.html#a5104440
&gt; Sent from the Lucene - C Developer forum at Nabble.com.
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Lucene Dynamic http Web Page Search</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200606.mbox/%3c7edfeeef0606290834y1c27fe50g37ff1e51b39043a@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0606290834y1c27fe50g37ff1e51b39043a@mail-gmail-com%3e</id>
<updated>2006-06-29T15:34:19Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 6/29/06, Clive. &lt;clive.lancaster@bestwestern.co.uk&gt; wrote:
&gt;
&gt; Hi,
&gt;
&gt; I am working on adding a search feature to a web site that uses single
&gt; database driven aspx pages and would like to know if Lucene can search using
&gt; the http url address or database to index from.
&gt; As current I can only see Lucene being able to search physical files in a
&gt; windows folder.
&gt;

This mail should probably be sent to a different lucene mailing list.
This is a list associated with a (stalled) attempt to port Lucene to
C.  You probably want a general Lucene mailing list.

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Lucene Dynamic http Web Page Search</title>
<author><name>&quot;Clive.&quot; &lt;clive.lancaster@bestwestern.co.uk&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200606.mbox/%3c5104440.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c5104440-post@talk-nabble-com%3e</id>
<updated>2006-06-29T15:05:54Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

Hi, 

I am working on adding a search feature to a web site that uses single
database driven aspx pages and would like to know if Lucene can search using
the http url address or database to index from. 
As current I can only see Lucene being able to search physical files in a
windows folder.

Any ideas?

-- 
View this message in context: http://www.nabble.com/Lucene-Dynamic-http-Web-Page-Search-tf1867982.html#a5104440
Sent from the Lucene - C Developer forum at Nabble.com.


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)</title>
<author><name>Ning Li &lt;ningli@us.ibm.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200605.mbox/%3cOF3A1DDA72.0925D213-ON85257168.007E4FA6-88257168.007E9746@us.ibm.com%3e"/>
<id>urn:uuid:%3cOF3A1DDA72-0925D213-ON85257168-007E4FA6-88257168-007E9746@us-ibm-com%3e</id>
<updated>2006-05-08T23:02:42Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Sorry, meant to send it to java-dev.


Regards,
Ning


Ning Li
Search Technologies
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120



</pre>
</div>
</content>
</entry>
<entry>
<title>Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)</title>
<author><name>Ning Li &lt;ningli@us.ibm.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200605.mbox/%3cOF3165287F.327ACD6D-ON85257168.007CD31A-88257168.007DEEAC@us.ibm.com%3e"/>
<id>urn:uuid:%3cOF3165287F-327ACD6D-ON85257168-007CD31A-88257168-007DEEAC@us-ibm-com%3e</id>
<updated>2006-05-08T22:55:31Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Content-type: text/plain; charset=US-ASCII


Today, applications have to open/close an IndexWriter and open/close an
IndexReader directly or indirectly (via IndexModifier) in order to handle a
mix of inserts and deletes. This performs well when inserts and deletes
come in fairly large batches. However, the performance can degrade
dramatically when inserts and deletes are interleaved in small batches.
This is because the ramDirectory is flushed to disk whenever an IndexWriter
is closed, causing a lot of small segments to be created on disk, which
eventually need to be merged.

We would like to propose a small API change to eliminate this problem. We
are aware that this kind change has come up in discusions before. See
http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
. The difference this time is that we have implemented the change and
tested its performance, as described below.

API Changes
-----------
We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
Using this method, inserts and deletes can be interleaved using the same
IndexWriter.

Note that, with this change it would be very easy to add another method to
IndexWriter for updating documents, allowing applications to avoid a
separate delete and insert to update a document.

Also note that this change can co-exist with the existing APIs for deleting
documents using an IndexReader. But if our proposal is accepted, we think
those APIs should probably be deprecated.

Coding Changes
--------------
Coding changes are localized to IndexWriter. Internally, the new
deleteDocuments() method works by buffering the terms to be deleted.
Deletes are deferred until the ramDirectory is flushed to disk, either
because it becomes full or because the IndexWriter is closed. Using Java
synchronization, care is taken to ensure that an interleaved sequence of
inserts and deletes for the same document are properly serialized.

We have attached a modified version of IndexWriter in Release 1.9.1 with
these changes. Only a few hundred lines of coding changes are needed. All
changes are commented by "CHANGE". We have also attached a modified version
of an example from Chapter 2.2 of Lucene in Action.

Performance Results
-------------------
To test the performance our proposed changes, we ran some experiments using
the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
Xeon server running Linux. The disk storage was configured as RAID0 array
with 5 drives. Before indexes were built, the input documents were parsed
to remove the HTML from them (i.e., only the text was indexed). This was
done to minimize the impact of parsing on performance. A simple
WhitespaceAnalyzer was used during index build.

We experimented with three workloads:
  - Insert only. 1.6M documents were inserted and the final
    index size was 2.3GB.
  - Insert/delete (big batches). The same documents were
    inserted, but 25% were deleted. 1000 documents were
    deleted for every 4000 inserted.
  - Insert/delete (small batches). In this case, 5 documents
    were deleted for every 20 inserted.

                                current       current          new
Workload                      IndexWriter  IndexModifier   IndexWriter
-----------------------------------------------------------------------
Insert only                     116 min       119 min        116 min
Insert/delete (big batches)       --          135 min        125 min
Insert/delete (small batches)     --          338 min        134 min

As the experiments show, with the proposed changes, the performance
improved by 60% when inserts and deletes were interleaved in small batches.
(See attached file: IndexWriter.java)(See attached file:
TestWriterDelete.java)


Regards,
Ning


Ning Li
Search Technologies
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120

</pre>
</div>
</content>
</entry>
<entry>
<title>Re: coldfusion and google suggest idea</title>
<author><name>&quot;Tim Archambault&quot; &lt;tarchambault@bangordailynews.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200604.mbox/%3cf1d57b6b0604111135m567caa05i571a62a8aafd8f5@mail.gmail.com%3e"/>
<id>urn:uuid:%3cf1d57b6b0604111135m567caa05i571a62a8aafd8f5@mail-gmail-com%3e</id>
<updated>2006-04-11T18:35:23Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Excuse my ignorance. I'll have to go find that mailing list. Sorry for your
inconvenience.

On 4/11/06, Garrett Rooney &lt;rooneg@electricjellyfish.net&gt; wrote:
&gt;
&gt; On 4/11/06, Tim Archambault &lt;tarchambault@bangordailynews.net&gt; wrote:
&gt; &gt; Using Ajax, I'd like the search box on my website to perform like GOOGLE
&gt; &gt; SUGGEST.
&gt; &gt;
&gt; &gt; When opening a Lucene index with LUKE, I noticed I can review the top
&gt; "X"
&gt; &gt; terms within the "body" element. What I'd like to do is extract those
&gt; values
&gt; &gt; so they can be displayed in my GOOGLE SUGGEST search box as the user
&gt; types
&gt; &gt; in the keyword(s).
&gt; &gt;
&gt; &gt; Any thoughts on what elements I should be reviewing within the Lucene
&gt; index
&gt; &gt; is greatly appreciated.
&gt;
&gt; This question should probably be directed to the Lucene Users mailing
&gt; list, not the development list for a C port of Lucene.
&gt;
&gt; -garrett
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: coldfusion and google suggest idea</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200604.mbox/%3c7edfeeef0604111133r5f327ca5y6bf0f3ab00e08117@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0604111133r5f327ca5y6bf0f3ab00e08117@mail-gmail-com%3e</id>
<updated>2006-04-11T18:33:53Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 4/11/06, Tim Archambault &lt;tarchambault@bangordailynews.net&gt; wrote:
&gt; Using Ajax, I'd like the search box on my website to perform like GOOGLE
&gt; SUGGEST.
&gt;
&gt; When opening a Lucene index with LUKE, I noticed I can review the top "X"
&gt; terms within the "body" element. What I'd like to do is extract those values
&gt; so they can be displayed in my GOOGLE SUGGEST search box as the user types
&gt; in the keyword(s).
&gt;
&gt; Any thoughts on what elements I should be reviewing within the Lucene index
&gt; is greatly appreciated.

This question should probably be directed to the Lucene Users mailing
list, not the development list for a C port of Lucene.

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>coldfusion and google suggest idea</title>
<author><name>&quot;Tim Archambault&quot; &lt;tarchambault@bangordailynews.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200604.mbox/%3cf1d57b6b0604111132q4ec3d6d7s79a8e8e4a0d37b00@mail.gmail.com%3e"/>
<id>urn:uuid:%3cf1d57b6b0604111132q4ec3d6d7s79a8e8e4a0d37b00@mail-gmail-com%3e</id>
<updated>2006-04-11T18:32:19Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Using Ajax, I'd like the search box on my website to perform like GOOGLE
SUGGEST.

When opening a Lucene index with LUKE, I noticed I can review the top "X"
terms within the "body" element. What I'd like to do is extract those values
so they can be displayed in my GOOGLE SUGGEST search box as the user types
in the keyword(s).

Any thoughts on what elements I should be reviewing within the Lucene index
is greatly appreciated.


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: lucene c</title>
<author><name>anton feldmann &lt;anton.feldmann@uni-bielefeld.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200603.mbox/%3c441F16D5.2030709@uni-bielefeld.de%3e"/>
<id>urn:uuid:%3c441F16D5-2030709@uni-bielefeld-de%3e</id>
<updated>2006-03-20T20:55:49Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Garrett Rooney schrieb:
&gt; On 3/20/06, anton feldmann &lt;anton.feldmann@uni-bielefeld.de&gt; wrote:
&gt;   
&gt;&gt; Hi
&gt;&gt;
&gt;&gt; i want to write a search engine in c are there examples?
&gt;&gt;     
&gt;
&gt; Unfortunately, lucene4c is incomplete.  The closest thing to examples
&gt; that currently exist are the unit tests, but be warned that you will
&gt; hit bugs if you try to use it.  Fixes for any problems you find would
&gt; be most appreciated though.
&gt;
&gt; -garrett
&gt;   
thanks alot i am going to try tomorrow

all the best

anton



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: lucene c</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200603.mbox/%3c7edfeeef0603201228l368e4316v71e5286c216f1eb5@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0603201228l368e4316v71e5286c216f1eb5@mail-gmail-com%3e</id>
<updated>2006-03-20T20:28:54Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 3/20/06, anton feldmann &lt;anton.feldmann@uni-bielefeld.de&gt; wrote:
&gt; Hi
&gt;
&gt; i want to write a search engine in c are there examples?

Unfortunately, lucene4c is incomplete.  The closest thing to examples
that currently exist are the unit tests, but be warned that you will
hit bugs if you try to use it.  Fixes for any problems you find would
be most appreciated though.

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>lucene c</title>
<author><name>anton feldmann &lt;anton.feldmann@uni-bielefeld.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200603.mbox/%3c441F0E7E.3010902@uni-bielefeld.de%3e"/>
<id>urn:uuid:%3c441F0E7E-3010902@uni-bielefeld-de%3e</id>
<updated>2006-03-20T20:20:14Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi

i want to write a search engine in c are there examples?

cherrs

anton feldmann



</pre>
</div>
</content>
</entry>
<entry>
<title>lucene searching in pdf</title>
<author><name>anton feldmann &lt;anton.feldmann@uni-bielefeld.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200603.mbox/%3c441F0CFC.8090809@uni-bielefeld.de%3e"/>
<id>urn:uuid:%3c441F0CFC-8090809@uni-bielefeld-de%3e</id>
<updated>2006-03-20T20:13:48Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I am writing a program to search into an PDF document. I have problems 
with generate an index file outof a lot of pdf documents. I want that i 
can store more than one pdfFile into the indexFile and i want to that 
the program is giving back the  1. file (apsolutepath) 2. word and lexem 
3. score 4. and line how do i get n pdf documents in one indexfile 
stored by 1, 2, 4?
i wrote a program that make an index of my filesystem and i can search 
in the filesystem to find files. i can not read pdf files and pars them 
with lucene.

i want to have an analyzer for all language lucene works with.

       IndexWriter write = new IndexWriter(index, new GermanAnalyzer(), 
true);

i use only the germananalyzer.

cheers

anton feldmann



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Does the project really use C?</title>
<author><name>&quot;Garrett Rooney&quot; &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200602.mbox/%3c7edfeeef0602241033t4ee1d39dw397012d9e589e56b@mail.gmail.com%3e"/>
<id>urn:uuid:%3c7edfeeef0602241033t4ee1d39dw397012d9e589e56b@mail-gmail-com%3e</id>
<updated>2006-02-24T18:33:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 2/24/06, yueyu lin &lt;popeyelin@gmail.com&gt; wrote:

&gt;   And the last question is if the project is active. I've seen that project
&gt; since last year but now it's almost the same like the one last year. What
&gt; kind of help do you need? Maybe I can have some shots on it.

Unfortunately, nobody has had time to work on Lucene4c in quite some
time.  The last time there was any activity the plan was to move from
a full C implementation to a wrapper around a GCJ compiled version of
Java Lucene.  This was basically working, but it wasn't overly
reliable, with most of the problems coming from either difficulty
getting it to build or difficulty integrating compiled Java code with
C code via GCJ's CNI.

If you're interested in helping, all the code is available in
Subversion at http://svn.apache.org/repos/asf/incuabtor/lucene4c/trunk,
or if you want the older version that was pure C there's a tag at
http://svn.apache.org/repos/asf/incubator/lucene4c/tags/pre-gcj-conversion/

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Does the project really use C?</title>
<author><name>&quot;yueyu lin&quot; &lt;popeyelin@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200602.mbox/%3c8e7e4d020602240128u8cf26e6oc92711270cc21967@mail.gmail.com%3e"/>
<id>urn:uuid:%3c8e7e4d020602240128u8cf26e6oc92711270cc21967@mail-gmail-com%3e</id>
<updated>2006-02-24T09:28:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,
  I'm a Lucene for java's user. Now I'm exciting to see the Lucene4C. As I
know before, there's another project named CLucene. That project is using
CPP that I'm not good at it. Lucene4c is another good choice for me. But
when I review the current svn, I found there were most of .cxx,.hxx files.
Although in these files, I found the main part is implemented in "extern
"C"" part. But I see "using ...". Does ansi C have name space concept now?
I'm afraid that project will try to be compatiable for both C and C++
because many features will be implemented like cpp that will be strange from
my view point.
  I don't think C grammar is not a burden to build a high quality project,
do you think so?
  And the last question is if the project is active. I've seen that project
since last year but now it's almost the same like the one last year. What
kind of help do you need? Maybe I can have some shots on it.

--
--
Yueyu Lin


</pre>
</div>
</content>
</entry>
<entry>
<title>*** URGENT: Provide your Board Report *NOW* ***</title>
<author><name>&quot;Noel J. Bergman&quot; &lt;noel@devtech.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200510.mbox/%3cNBBBJGEAGJAKLIDBKJOPIEDMEBAC.noel@devtech.com%3e"/>
<id>urn:uuid:%3cNBBBJGEAGJAKLIDBKJOPIEDMEBAC-noel@devtech-com%3e</id>
<updated>2005-10-20T21:05:02Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Folks,

Your report is well overdue, and were it not for the fact that the Board
pushed back their meeting by a week this month, the Incubator report would
have already been submitted, noting that yours had not been submitted.
Please get your report posted IMMEDIATELY.

The report is being accumulated at:
  http://wiki.apache.org/incubator/IncubatorBoardReport2005Q4

This is a quarterly obligation for all ASF projects, Incubator or otherwise.
It was announced to everyone weeks ago.  Please do not be late again.

Also, a number of projects have not updated their STATUS files.  Please do
so immediately.  If you do not know how, ask your Mentor.  If that doesn't
work, please notify the Incubator PMC.

	--- Noel



</pre>
</div>
</content>
</entry>
<entry>
<title>Searching for text information in databases</title>
<author><name>Rene Carballo &lt;rene_carballo@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200510.mbox/%3c20051006160448.46164.qmail@web30711.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20051006160448-46164-qmail@web30711-mail-mud-yahoo-com%3e</id>
<updated>2005-10-06T16:04:48Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I can't find anywhere, a Lucene example for indexing and full-text searching of a simple database
table, to return table rows ranked by relevance, not documents.  An example with word-stemming
would be best, although I gather this happens automatically in the analysis step. 

Someone has recommended dbsight.net.  It's not clear what this buys me as it looks like I'd
still have to build a Lucene index from my databse table, using Java code.  (Is this right?)
 
Does anyone know of a Lucene-only example for querying tables?  Or is there an alternative
to DBSight that's easier to use or with more documentation?
 
Thanks, rene'
 




</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Ruby &amp; Lucene &amp; ApacheCon</title>
<author><name>Thomas Dudziak &lt;tomdzk@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200508.mbox/%3c224f3234050808124647f71305@mail.gmail.com%3e"/>
<id>urn:uuid:%3c224f3234050808124647f71305@mail-gmail-com%3e</id>
<updated>2005-08-08T19:46:35Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
On 8/8/05, Erik Hatcher &lt;erik@ehatchersolutions.com&gt; wrote:
&gt; Thomas - have you had a look at PyLucene and how they do the gcj/SWIG
&gt; wizardry?  What kinds of issues did you encounter with gcj?  Perhaps
&gt; Andi Vajda from PyLucene could offer some advice?
&gt; 
&gt; I'd rather see the gcj/SWIG approach moving forward so that SWIG
&gt; Lucene doesn't lag behind Java Lucene where all the innovation happens.

Yep, I tried to compile PyLucene on my Mac, but it failed because of
the Python version that comes with Mac OS 10.4 (which is 2.3). To be
fair to PyLucene, I only tried for a couple of hours as I don't really
have an interest in Python, I actually only wanted to see how they use
gcj.
But aside from that, I tried the PyLucene way first for a whole week.
First the issue of getting to run gcj on Mac OS X which ain't easy at
all - I had to install darwinports with a fresh gcc. Getting gcj to
run over Lucene is easy, works out of the box. But linking ruby with
swig-wrapped gcj-compiled lucene is not, all I got is a gcj internal
compiler error (with both gcc/gcj 3.4.3 and 4.0.1). This bug is in the
gcc bug list marked as a regression.
On Windows I had a similar amount of trouble using both MingW and
cygwin; I wasn't able to compile &amp; link the stuff against ruby.

So to summarize, while there is definitely a strong argument for using
gcj to create other-language bindings from the Java-version, there are
a few issues that IMO make a strong case for CLucene:

* at best gcj is difficult to use; but on Windows &amp; MacOS it is quite
involved and difficult. For me it was nearly impossible as I'm no
gcc/gcj expert

* it prevents or at least makes it extremely difficult to create
certain bindings such as COM and C# (perhaps except mono) as MingW is
not easily combined with VisualC++ AFAIK. And I don't think that there
is any chance of debugging such a combination when a problem arises.

* the amount of work necessary to swig-wrap the gcj-compiled Lucene to
a given target language is immense - just have a look at the swig file
of PyLucene and the Makefile to make the magic happen; I think this
must be a nightmare to maintain. I cannot really tell what amount of
work would be necessary for CLucene but since it is a straight C++
library and built with swig in mind, I would be surprised if it is not
a lot less

So from a technical point of view, it is my opinion that a pure C++
version is easier to maintain and evolve right now. I also think that
most of the innovation in Lucene is not Java-specific so while it
would be duplicated implementation work, the algorithms are the same
(or near enough). Also, a pure C++ version of Lucene gives it more
momentum IMO in both the Linux world (mbox_lucene or something similar
comes to mind) and the Microsoft world (.Net etc.)

&gt; As for Lucene4C versus CLucene and moving CLucene to Apache - I'll
&gt; let the c-dev@lucene list discuss it.  I'm happy to have CLucene at
&gt; Apache too, though it seems simpler for us to only house a single
&gt; implementation in C.  The gcj version would be ideal in my mind, but
&gt; I'm also not skilled in gcj (and haven't touched C in decades,
&gt; practically) - so it certainly is up to the actual coders where to go
&gt; with it.

I don't know whether it is a "Lucene4C vs. CLucene" anyway. From what
I understand Lucene4C tries to create a simpler API for Lucene, and
while they are building on top of a gcj-compiled version of Java
Lucene, that is likely not a requirement (I don't think that they want
to expose any of the gcj-generated classes).
Besides, CLucene is quite far so from a practical point of view it
would make sense to use /maintain it. Being the practical guy that I
am, I think that any issues between Lucene4C, PyLucene, CLucene can be
worked out if the developers work together. After all, for all I know
it might even be possible to use a mixture of the Lucene4C API (for
plain C) and the CLucene API (for C++) in front of a gcj-compiled Java
Lucene, and all SWIG wrappers could then be build on top of this API.
At lest technically this is possible and perhaps even feasible.

regards,
Tom


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Ruby &amp; Lucene &amp; ApacheCon</title>
<author><name>Erik Hatcher &lt;erik@ehatchersolutions.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200508.mbox/%3c0ADF8217-40FE-4FB3-864B-14F9CB809ED5@ehatchersolutions.com%3e"/>
<id>urn:uuid:%3c0ADF8217-40FE-4FB3-864B-14F9CB809ED5@ehatchersolutions-com%3e</id>
<updated>2005-08-08T18:49:01Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thomas - have you had a look at PyLucene and how they do the gcj/SWIG  
wizardry?  What kinds of issues did you encounter with gcj?  Perhaps  
Andi Vajda from PyLucene could offer some advice?

I'd rather see the gcj/SWIG approach moving forward so that SWIG  
Lucene doesn't lag behind Java Lucene where all the innovation happens.

As for Lucene4C versus CLucene and moving CLucene to Apache - I'll  
let the c-dev@lucene list discuss it.  I'm happy to have CLucene at  
Apache too, though it seems simpler for us to only house a single  
implementation in C.  The gcj version would be ideal in my mind, but  
I'm also not skilled in gcj (and haven't touched C in decades,  
practically) - so it certainly is up to the actual coders where to go  
with it.

     Erik


On Aug 8, 2005, at 8:36 AM, Brian McCallister wrote:

&gt; At ApacheCon EU I roped one of the most productive developers  
&gt; (Thomas Dudziak) I know (who also has SWIG experience =) into the  
&gt; Ruby/Lucene thing. Anyway, he's had little success with gcj and  
&gt; lucene4c thus far (lucene4c isn't quite complete enough, and as  
&gt; Garrett knows (and said he's working on) kind of tough to build.
&gt;
&gt; Anyway, Thomas went and in an afternoon put SWIG bindings around  
&gt; CLucene =)
&gt;
&gt; Now, the more fun part, Ben (whose email I don't have) of CLucene  
&gt; would like to move the project to Apache =)
&gt;
&gt; Thoughts?
&gt;
&gt; -Brian
&gt;
&gt; On Aug 8, 2005, at 6:34 AM, Thomas Dudziak wrote:
&gt;
&gt;
&gt;&gt; Hi,
&gt;&gt;
&gt;&gt; after much tinkering and installing/reinstalling gcc/gcj (3.4.3 and
&gt;&gt; 4.0.1) I finally got a combo of ruby+swig+gcj to compile, only to be
&gt;&gt; stopped dead by a internal compiler error of GCJ. I honestly don't
&gt;&gt; know why this works for PyLucene (which btw. I didnt' get to compile
&gt;&gt; because the mac version of Python is 2.3 whereas PyLucene seems to
&gt;&gt; require 2.4).
&gt;&gt;
&gt;&gt; And then yesterday by chance I spotted a mail by Ben van Klinken on
&gt;&gt; the SWIG mailing list who is the lead developer of the CLucene  
&gt;&gt; project
&gt;&gt; (http://clucene.sourceforge.net/), a full C++ port of Lucene. So I
&gt;&gt; fired up an email to him and he told me that they've rewritten  
&gt;&gt; CLucene
&gt;&gt; to be easily usable with SWIG (currently he's doing a C# and COM
&gt;&gt; wrapper for CLucene) and they already have more or less the
&gt;&gt; functionality as Lucene 1.4.3.
&gt;&gt;
&gt;&gt; So I decided to give it a try, and after about half an hour I not  
&gt;&gt; only
&gt;&gt; had CLucene compiled and linked, but also a basic SWIG ruby wrapper
&gt;&gt; around one of the helper classes of CLucene (compared to about a week
&gt;&gt; for the same using gcj).
&gt;&gt;
&gt;&gt; The interesting thing now is that they'd like to move to Apache, they
&gt;&gt; even proposed incubation
&gt;&gt; (http://clucene.sourceforge.net/incubatorproposal.htm) though they
&gt;&gt; seem to be missing a sponsor (Erik didn't answer as far as I could  
&gt;&gt; see
&gt;&gt; on the Lucene dev mailing list).
&gt;&gt; I'd very much like to use CLucene as the basis for the ruby binding
&gt;&gt; (and Ben is quite willing to help with any SWIG wrappers and C++
&gt;&gt; issues), so my question is: could you talk to Erik as to whether it
&gt;&gt; would be possible to accept the incubation proposal (via  
&gt;&gt; sponsoring by
&gt;&gt; the Lucene PMC) ? From what I saw so far of CLucene, I might be able
&gt;&gt; manage to create a ruby binding of the querying in August, which  
&gt;&gt; would
&gt;&gt; be a good start for the RubyLucene repository.
&gt;



</pre>
</div>
</content>
</entry>
<entry>
<title>punt for now.</title>
<author><name>steve johnson &lt;aces4me@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200508.mbox/%3c20050804200313.47702.qmail@web30412.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20050804200313-47702-qmail@web30412-mail-mud-yahoo-com%3e</id>
<updated>2005-08-04T20:03:13Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I did manage to build and install the pre-gcj-backend
version of lucene4c.  The SoC guy will have to run
with that for now and hopefully when the gcj-backend
version works the retrofit won't be too much work. 
Thanks for all your help guys.

Steve



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Starting from scratch with lucene4c/mod_mbox</title>
<author><name>steve johnson &lt;aces4me@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200508.mbox/%3c20050804123103.23642.qmail@web30407.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20050804123103-23642-qmail@web30407-mail-mud-yahoo-com%3e</id>
<updated>2005-08-04T12:31:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Any chance the core dump is being caused by me linking
directly to the liblucene4c.so instead of the .la
file?

Anyone building lucene4c into a module using the .so
file?

Anyone have a good liblucene4c.la file I can try to
hack up to make work on my systems?

Steve




</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Starting from scratch with lucene4c/mod_mbox</title>
<author><name>Garrett Rooney &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200507.mbox/%3c42E8F73B.7090901@electricjellyfish.net%3e"/>
<id>urn:uuid:%3c42E8F73B-7090901@electricjellyfish-net%3e</id>
<updated>2005-07-28T15:18:19Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

&gt; Stack trace 
&gt; 
&gt; #0  0xb71a83b0 in _Jv_RegisterClassHookDefault () from
&gt; /usr/lib/libgcj.so.6
&gt; #1  0xb71a8958 in _Jv_RegisterClasses () from
&gt; /usr/lib/libgcj.so.6
&gt; #2  0xb7cac7a1 in frame_dummy () from
&gt; /usr/local/lucene4c/lib/liblucene4c.so
&gt; #3  0xb7cac428 in _init () from
&gt; /usr/local/lucene4c/lib/liblucene4c.so
&gt; #4  0xb7ff61ce in _dl_catch_error () from
&gt; /lib/ld-linux.so.2
&gt; #5  0xb7ff62ba in _dl_init () from /lib/ld-linux.so.2
&gt; #6  0xb7edd2b2 in _dl_open () from /lib/tls/libc.so.6
&gt; #7  0xb7ff6016 in _dl_catch_error () from
&gt; /lib/ld-linux.so.2
&gt; #8  0xb7edced6 in _dl_open () from /lib/tls/libc.so.6
&gt; #9  0xb7f0b038 in dlopen () from /lib/tls/libdl.so.2
&gt; #10 0xb7ff6016 in _dl_catch_error () from
&gt; /lib/ld-linux.so.2
&gt; #11 0xb7f0b4a6 in dlerror () from /lib/tls/libdl.so.2
&gt; #12 0xb7f0afe4 in dlopen () from /lib/tls/libdl.so.2
&gt; #13 0xb7fa32c3 in apr_dso_load (res_handle=0xbffff888,
&gt;     path=0x8115da0
&gt; "/usr/local/apache2/modules/mod_mbox.so",
&gt; pool=0x80bc0a8)
&gt;     at dso.c:138
&gt; #14 0x0807e010 in load_module (cmd=0xbffffa58,
&gt; dummy=0xbffff910,
&gt;     modname=0x8115d78 "mbox_module",
&gt; filename=0x8115d88 "modules/mod_mbox.so")
&gt;     at mod_so.c:240
&gt; #15 0x08080cb6 in invoke_cmd (cmd=0x80a77c0,
&gt; parms=0xbffffa58,
&gt;     mconfig=0xbffff910, args=0x80ff1da "") at
&gt; config.c:797
&gt; #16 0x0808155e in ap_build_config_sub (p=Variable "p"
&gt; is not available.
&gt; ) at config.c:1335
&gt; #17 0x080819ce in ap_build_config (parms=0xbffffa58,
&gt; p=0x80bc0a8,
&gt; ---Type &lt;return&gt; to continue, or q &lt;return&gt; to quit---
&gt;     temp_pool=0x80fd1a8, conftree=0x80b2c48) at
&gt; config.c:1127
&gt; #18 0x080820d0 in process_resource_config_nofnmatch
&gt; (s=0x80c0800, fname=Variable "fname" is not available.
&gt; )
&gt;     at config.c:1513
&gt; #19 0x08082438 in ap_process_resource_config
&gt; (s=0x80c0800,
&gt;     fname=0x80f8d00
&gt; "/usr/local/apache2/conf/httpd.conf",
&gt; conftree=0x80b2c48,
&gt;     p=0x80bc0a8, ptemp=0x80fd1a8) at config.c:1549
&gt; #20 0x08082c1f in ap_read_config (process=0x80bc0a8,
&gt; ptemp=0x80fd1a8,
&gt;     filename=0x80a8416 "conf/httpd.conf",
&gt; conftree=0x80b2c48) at config.c:1892
&gt; #21 0x08084beb in main (argc=2, argv=0xbffffcf4) at
&gt; main.c:589
&gt; 

Well, if it's crashing inside _Jv_RegisterClassHookDefault, then perhaps 
looking at that function in the GCC source would give us a clue what is 
going wrong.  I don't actually have time to look myself, at least not 
right now, but if you find anything interesting there please let me 
know.  Other than that I'm running out of ideas, this may not be a 
problem that can be solved without me actually poking around at it 
myself, and if so I probably won't get to spend much time with it until 
this weekend.

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Starting from scratch with lucene4c/mod_mbox</title>
<author><name>steve johnson &lt;aces4me@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200507.mbox/%3c20050727141403.15304.qmail@web30406.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20050727141403-15304-qmail@web30406-mail-mud-yahoo-com%3e</id>
<updated>2005-07-27T14:14:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>


--- Garrett Rooney &lt;rooneg@electricjellyfish.net&gt;
wrote:
 
&gt; Well, the only reason you need the trunk version of
&gt; APR for Lucene4C is 
&gt; that Paul fixed some problems with jlibtool in the
&gt; trunk and those fixes 
&gt; aren't available in a released version yet.  You
&gt; could try copying the 
&gt; trunk's version of jlibtool.c over to the older APR,
&gt; build that APR with 
&gt;   --enable-experimental-libtool, then build
&gt; everything from scratch 
&gt; using that version of APR.  I can't recall if
&gt; anything in Lucene4C 
&gt; enforces the use of a current version of APR, but if
&gt; it does it probably 
&gt; isn't hard to work around.
&gt; 
&gt; -garrett
&gt; 

I built lucene4c with the apr that comes with the
httpd tarball (with the jlibtool.c from the trunk). 
No change in symptoms.  Stack trace looks very
similar.  here is the data:

ldd mod_mbox.so
                libexpat.so.1 =&gt;
/usr/lib/libexpat.so.1 (0xb7fc7000)
        liblucene4c.so =&gt;
/usr/local/lucene4c/lib/liblucene4c.so (0xb7e21000)
        libapr-0.so.0 =&gt;
/usr/local/apache2/lib/libapr-0.so.0 (0xb7e00000)
        librt.so.1 =&gt; /lib/tls/librt.so.1 (0xb7dfa000)
        libm.so.6 =&gt; /lib/tls/libm.so.6 (0xb7dd8000)
        libcrypt.so.1 =&gt; /lib/tls/libcrypt.so.1
(0xb7dab000)
        libnsl.so.1 =&gt; /lib/tls/libnsl.so.1
(0xb7d97000)
        libpthread.so.0 =&gt; /lib/tls/libpthread.so.0
(0xb7d87000)
        libdl.so.2 =&gt; /lib/tls/libdl.so.2 (0xb7d84000)
        libaprutil-0.so.0 =&gt;
/usr/local/apache2/lib/libaprutil-0.so.0 (0xb7d70000)
        libc.so.6 =&gt; /lib/tls/libc.so.6 (0xb7c3b000)
        libgcj.so.6 =&gt; /usr/lib/libgcj.so.6
(0xb6b70000)
        libgcc_s.so.1 =&gt; /lib/libgcc_s.so.1
(0xb6b64000)
        libz.so.1 =&gt; /usr/lib/libz.so.1 (0xb6b52000)
        /lib/ld-linux.so.2 =&gt; /lib/ld-linux.so.2
(0x80000000)


ldd liblucene4c.so
                libgcj.so.6 =&gt; /usr/lib/libgcj.so.6
(0xb6d82000)
        libapr-0.so =&gt;
/usr/local/apr/.libs/libapr-0.so (0xb6d60000)
        librt.so.1 =&gt; /lib/tls/librt.so.1 (0xb6d5a000)
        libm.so.6 =&gt; /lib/tls/libm.so.6 (0xb6d38000)
        libcrypt.so.1 =&gt; /lib/tls/libcrypt.so.1
(0xb6d0b000)
        libnsl.so.1 =&gt; /lib/tls/libnsl.so.1
(0xb6cf7000)
        libpthread.so.0 =&gt; /lib/tls/libpthread.so.0
(0xb6ce8000)
        libdl.so.2 =&gt; /lib/tls/libdl.so.2 (0xb6ce4000)
        libgcc_s.so.1 =&gt; /lib/libgcc_s.so.1
(0xb6cd9000)
        libz.so.1 =&gt; /usr/lib/libz.so.1 (0xb6cc7000)
        libc.so.6 =&gt; /lib/tls/libc.so.6 (0xb6b92000)
        /lib/ld-linux.so.2 =&gt; /lib/ld-linux.so.2
(0x80000000)


Stack trace 

#0  0xb71a83b0 in _Jv_RegisterClassHookDefault () from
/usr/lib/libgcj.so.6
#1  0xb71a8958 in _Jv_RegisterClasses () from
/usr/lib/libgcj.so.6
#2  0xb7cac7a1 in frame_dummy () from
/usr/local/lucene4c/lib/liblucene4c.so
#3  0xb7cac428 in _init () from
/usr/local/lucene4c/lib/liblucene4c.so
#4  0xb7ff61ce in _dl_catch_error () from
/lib/ld-linux.so.2
#5  0xb7ff62ba in _dl_init () from /lib/ld-linux.so.2
#6  0xb7edd2b2 in _dl_open () from /lib/tls/libc.so.6
#7  0xb7ff6016 in _dl_catch_error () from
/lib/ld-linux.so.2
#8  0xb7edced6 in _dl_open () from /lib/tls/libc.so.6
#9  0xb7f0b038 in dlopen () from /lib/tls/libdl.so.2
#10 0xb7ff6016 in _dl_catch_error () from
/lib/ld-linux.so.2
#11 0xb7f0b4a6 in dlerror () from /lib/tls/libdl.so.2
#12 0xb7f0afe4 in dlopen () from /lib/tls/libdl.so.2
#13 0xb7fa32c3 in apr_dso_load (res_handle=0xbffff888,
    path=0x8115da0
"/usr/local/apache2/modules/mod_mbox.so",
pool=0x80bc0a8)
    at dso.c:138
#14 0x0807e010 in load_module (cmd=0xbffffa58,
dummy=0xbffff910,
    modname=0x8115d78 "mbox_module",
filename=0x8115d88 "modules/mod_mbox.so")
    at mod_so.c:240
#15 0x08080cb6 in invoke_cmd (cmd=0x80a77c0,
parms=0xbffffa58,
    mconfig=0xbffff910, args=0x80ff1da "") at
config.c:797
#16 0x0808155e in ap_build_config_sub (p=Variable "p"
is not available.
) at config.c:1335
#17 0x080819ce in ap_build_config (parms=0xbffffa58,
p=0x80bc0a8,
---Type &lt;return&gt; to continue, or q &lt;return&gt; to quit---
    temp_pool=0x80fd1a8, conftree=0x80b2c48) at
config.c:1127
#18 0x080820d0 in process_resource_config_nofnmatch
(s=0x80c0800, fname=Variable "fname" is not available.
)
    at config.c:1513
#19 0x08082438 in ap_process_resource_config
(s=0x80c0800,
    fname=0x80f8d00
"/usr/local/apache2/conf/httpd.conf",
conftree=0x80b2c48,
    p=0x80bc0a8, ptemp=0x80fd1a8) at config.c:1549
#20 0x08082c1f in ap_read_config (process=0x80bc0a8,
ptemp=0x80fd1a8,
    filename=0x80a8416 "conf/httpd.conf",
conftree=0x80b2c48) at config.c:1892
#21 0x08084beb in main (argc=2, argv=0xbffffcf4) at
main.c:589





</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Starting from scratch with lucene4c/mod_mbox</title>
<author><name>Garrett Rooney &lt;rooneg@electricjellyfish.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200507.mbox/%3c42E6A9C8.6060206@electricjellyfish.net%3e"/>
<id>urn:uuid:%3c42E6A9C8-6060206@electricjellyfish-net%3e</id>
<updated>2005-07-26T21:23:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
steve johnson wrote:
&gt; 
&gt;&gt;Well, first off, one is linked against libapr-0.so
&gt;&gt;and one is linked 
&gt;&gt;with libapr-1.so, that's probably a bad thing... 
&gt;&gt;I'd suggest making 
&gt;&gt;sure everything is linked against one or the other,
&gt;&gt;and trying again.
&gt;&gt;
&gt;&gt;-garrett
&gt;&gt;
&gt; 
&gt; Ok, seems reasonable but I'm not sure how it can be
&gt; done.  httpd won't build with the version of apr that
&gt; lucene4c requires. And mod_mbox builds with apxs which
&gt; uses the same libraries as httpd. Even if I could
&gt; rebuild mod_mbox with the version lucene4c wants I
&gt; would probably end up in the same situation because of
&gt; the version difference between the httpd and mod_mbox.

Well, the only reason you need the trunk version of APR for Lucene4C is 
that Paul fixed some problems with jlibtool in the trunk and those fixes 
aren't available in a released version yet.  You could try copying the 
trunk's version of jlibtool.c over to the older APR, build that APR with 
  --enable-experimental-libtool, then build everything from scratch 
using that version of APR.  I can't recall if anything in Lucene4C 
enforces the use of a current version of APR, but if it does it probably 
isn't hard to work around.

-garrett


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Starting from scratch with lucene4c/mod_mbox</title>
<author><name>steve johnson &lt;aces4me@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/lucene-c-dev/200507.mbox/%3c20050726211110.73242.qmail@web30409.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20050726211110-73242-qmail@web30409-mail-mud-yahoo-com%3e</id>
<updated>2005-07-26T21:11:09Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>


&gt; 
&gt; Well, first off, one is linked against libapr-0.so
&gt; and one is linked 
&gt; with libapr-1.so, that's probably a bad thing... 
&gt; I'd suggest making 
&gt; sure everything is linked against one or the other,
&gt; and trying again.
&gt; 
&gt; -garrett
&gt; 
Ok, seems reasonable but I'm not sure how it can be
done.  httpd won't build with the version of apr that
lucene4c requires. And mod_mbox builds with apxs which
uses the same libraries as httpd. Even if I could
rebuild mod_mbox with the version lucene4c wants I
would probably end up in the same situation because of
the version difference between the httpd and mod_mbox.




</pre>
</div>
</content>
</entry>
</feed>
