lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Vaillancourt <d...@webimpact.com>
Subject Supported Languages
Date Thu, 10 Jun 2004 15:37:08 GMT
HI all,

I've noticed from the documentation that Russian and German languages are 
supported by Lucene, but does Lucene support the french language.

What is the definition of support in regards to language for Lucene?  Being 
able to index a document?  Or being able to search a document?  Or is it 
simply being able to sort results?

Thanks

At 04:39 PM 09/06/2004, you wrote:
>Erik Hatcher wrote:
>
>>On Jun 9, 2004, at 12:21 PM, David Spencer wrote:
>>
>>>>show us that most folks query with 1 - 3 words and do not use the any 
>>>>of the advanced features.
>>>
>>>
>>>
>>>But with automagic query expansion these things might be done behind the 
>>>scenes.  Nutch, for one, expands simple queries to check against 
>>>multiple fields, with different boosts, and even gives a bonus for terms 
>>>that are near each other.
>>
>>
>>Ah yes!  Don't worry, I hadn't forgotten about Nutch.  I'm tinkering with 
>>its query parsing and analysis as we speak in fact.  Very clever indeed.
>>
>>>>The elegance of the query syntax is quite important, and QueryParser 
>>>>has gotten a bit hairy.  I would enjoy discussions on creating new 
>>>>query parsers (one size doesn't fit all, I don't think) and what syntax
>>>
>>>
>>>
>>>I suggested in some email a while ago making the QueryParser extensible 
>>>at, runtime or startup time, so you can add other types if queries that 
>>>it doesn't support - so you have a way of registering these other query 
>>>types (SpanQuery, SubstringQuery etc) and then some syntax like 
>>>"span:foo" to invoke the query expander registered w/ "span" on "foo"...
>>
>>
>>I would be curious to see how an implementation of this played out.
>>For example, could I add my own syntax such that
>>
>>     "some phrase" <-3-> "another phrase"
>>
>>could be parsed into a SpanNearQuery of two SpanNearQuery's?
>>
>>I like the idea of a flexible run-time grammar, but it sounds too good to 
>>be true in a general purpose kinda way.
>
>My idea isn't perfect for humans, but at least lets you use queries not 
>hard coded.
>
>You have something like
>
>[1] how you register, could be in existing QueryParser
>
>void register( String name,  SubqueryParser qp)
>
>[2] what you register
>
>interface SubQueryParser
>{
>Query parse( String s); // parses string user enters, forms a Query...
>}
>
>[3] example of registration
>
>register( "substring", new SubstringQP());  // instead of prefix matches 
>allows term anywhere
>register( "span", new SurroundQP());
>register( "syn", new SynonymExpanderQP()); // expands a word to include 
>synonyms
>
>[4]  syntax
>
>normal query parser syntax but add something else like "NAME::TEXT" (note 
>2 colons) so
>
>this:          "black syn::bird"
>
>expands to calls in the new extensible query parser,  something like
>
>BooleanQuery bq = ...
>bq.add( new TermQuery( "contents", "black"))
>bq.add( SubstringParser.parse( "bird")) // really SynonymExpanderQP
>return bq
>
>behind the scenes SynonymExpanderQP expanded "bird" to the query 
>equivalent of, um, "bird avian^.5 wingedanimal^.5" or whatnot.
>
>[5] the point
>
>Be backward  compatible and "natural" for existing query syntax, but leave 
>a hook so that if you innovate and define new query expansion code there's 
>some hope of someone using it as they can in theory drop it in and use it 
>w/o coding. Right now if you create some code in this area I suspect 
>there's little chance people will try it out as there's too much friction 
>to try it out.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>     Erik
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
416-815-2000 ext. 245
email: donv@web-impact.com
web: http://www.web-impact.com





This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.













Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message