lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Lewis" <p...@uptima.co.uk>
Subject Re: Java TextCat 0.1
Date Sun, 09 Nov 2003 19:51:31 GMT
Hi Maurits

With the language guesser it doesn't matter whether they are in one index or
language specific indexes, more how you want to organise your data.  Even if
you have separate language dictionaries, I think that it would be best to
have a language field - holding the guessed language of the document.

An alternative would be language tagging, where you embed language tags into
the document and in this way can correctly handle documents that comprise
more than one language - but unfortunately I don't think that there are any
opensource language taggers.

Cheers

Pete

----- Original Message ----- 
From: "maurits van wijland" <m.vanwijland@quicknet.nl>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Saturday, November 08, 2003 7:30 AM
Subject: Re: Java TextCat 0.1


> Pete,
>
> It's because I think of search engine as a guided search engine. They
should
> offer
> the 'end-user' help when trying to find information. So a drop-down should
> not
> be included into the search interface.
>
> Ofcourse a drop down is a good method to choose a query language. Are the
> different
> languages in different indexes or are they all combined into one?
>
> chrs,
>
> Maurits
>
> ----- Original Message ----- 
> From: "Pete Lewis" <pete@uptima.co.uk>
> To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> Sent: Friday, November 07, 2003 8:58 PM
> Subject: Re: Java TextCat 0.1
>
>
> > Hi Maurits
> >
> > Language guessing is OK for documents where you have a fair amount of
text
> > to play with; search clues however are much shorter - often just a word
or
> > two.  Therefore why don't you have a default query language and then
just
> > have a drop-down box to let the user select the query language if
> different
> > from the default.
> >
> > Cheers
> >
> > Pete
> >
> > ----- Original Message ----- 
> > From: "maurits van wijland" <m.vanwijland@quicknet.nl>
> > To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> > Sent: Friday, November 07, 2003 7:12 PM
> > Subject: Re: Java TextCat 0.1
> >
> >
> > > Hi all,
> > >
> > > Incze,  do you choose the analyer when indexing and seraching? how?
> > > Can you send an example code?
> > >
> > > I have tried this with a naive bayes language guesser, but the problem
i
> > > found is that whren searching, the query words are to 'small' to
> > accurately
> > > predict a language...
> > >
> > > So, how do you manage?
> > >
> > > kind regards,
> > >
> > > Maurits van Wijland
> > >
> > >
> > > ----- Original Message ----- 
> > > From: "Incze Lajos" <incze@mail.matav.hu>
> > > To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> > > Sent: Friday, November 07, 2003 2:31 AM
> > > Subject: Re: Java TextCat 0.1
> > >
> > >
> > > > On Thu, Nov 06, 2003 at 02:14:11PM +0100, Patrick Debois wrote:
> > > > > Java interfacing with libtextcat. Might be of interest for you
> > > (according
> > > > > to the mailling lists)
> > > > >
> > > > > I've used it for choosing the correct analyzer in Lucene Snowball
> > > > >
> > > > > I will provide it on my website
> http://www.jedi.be/JTextCat/index.html
> > > > >
> > > > > Hope it does not violate any copyrights.
> > > > >
> > > >
> > ---------------------------------------------------------------------
> > > >
> > > > Have you seen this project?
> > > >
> > > > http://ngramj.sourceforge.net/
> > > >
> > > > (Pure java N-Gram lib, with a sample servlet.)
> > > >
> > > > incze
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message