lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itamar Syn-Hershko" <ita...@divrei-tora.com>
Subject RE: Rebuilding Document from index?
Date Tue, 26 Feb 2008 22:17:05 GMT
Not to ruin your party, but I'm not sure exactly what this Lexicon object is
for and how it should work. Plus, the requirements I have for analyzing
Hebrew (not only for the MoreLikeThis functionality) are far more demanding
than what is needed for French.

But I'm open to any suggestion on this matter (BTW, if I understand what
you're trying to do correctly, this post of mine should be related as well:
http://www.mail-archive.com/java-user@lucene.apache.org/msg18650.html).

Itamar. 

-----Original Message-----
From: Mathieu Lecarme [mailto:mathieu@garambrogne.net] 
Sent: Tuesday, February 26, 2008 11:18 PM
To: java-user@lucene.apache.org
Subject: Re: Rebuilding Document from index?

Yes, I've found a tester!
A patch was submited for this kind of job :
https://issues.apache.org/jira/browse/LUCENE-1190

And here is the svn work in progress :
https://admin.garambrogne.net/subversion/revuedepresse/trunk/src/java/lexico
n

And the web version :
https://admin.garambrogne.net/projets/revuedepresse/browser/trunk/src/java/l
exicon


Le 26 févr. 08 à 17:33, Itamar Syn-Hershko a écrit :

>
> Implementing something like MoreLikeThis for Hebrew. Non-Hebrew 
> implementations are relevant, but much less accurate since a word like 
> PURIM can show up in the actual document with initials (LPURIM, BPURIM
> etc.) or
> even with 1-4 letters after it which all reffer to the same term, and 
> then the score it will get upon analyzing using the current 
> MoreLikeThis implementation will not reflect its real importance.
>
> I'm still trying to engineer the best possible solution for Lucene 
> with Hebrew, right now my path is NOT using a stemmer by default, only 
> by explicit request of the user. MoreLikeThis would only return 
> relevant results if I will use a non-stemmed scoring and lookup.
>
> Itamar.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Tuesday, February 26, 2008 4:29 PM
> To: java-user@lucene.apache.org
> Subject: Re: Rebuilding Document from index?
>
> See TermDocs/TermEnum. Or perhaps TermFreqVector. I admit I haven't 
> used that last, but that family of methods ought to fix you up.
>
> What problem are you trying to solve? Perhaps there are better 
> solutions to suggest....
>
> Best
> Erick
>
> On Mon, Feb 25, 2008 at 6:04 PM, Itamar Syn-Hershko 
> <itamar@divrei-tora.com
> >
> wrote:
>
>> Hello again,
>>
>> If I wanted to do this programmatically, how would I do this 
>> (retrieve a list of all terms in a field for a specific document - 
>> better if it was in alphabettic order and with frequency data)?
>>
>> Thanks,
>>
>> Itamar.
>>
>> -----Original Message-----
>> From: spring@gmx.eu [mailto:spring@gmx.eu]
>> Sent: Friday, February 22, 2008 3:22 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Rebuilding Document from index?
>>
>> You can use Luke to rebuild the document. It will show you the terms 
>> of the analyzed document, not the original content.
>> And this is what you want, if I understood you correctly.
>>
>>> -----Original Message-----
>>> From: Itamar Syn-Hershko [mailto:itamar@divrei-tora.com]
>>> Sent: Freitag, 22. Februar 2008 14:02
>>> To: java-user@lucene.apache.org
>>> Subject: Rebuilding Document from index?
>>>
>>> Hi,
>>>
>>> Is it possible to re-create a document from an index, if its not 
>>> stored?
>>> What I'm looking for is a way to have a text document with the text 
>>> AFTER it was analyzed, so I can see how my analyzer handles certain 
>>> cases. So that means I don't care if I will not get the original 
>>> document. I want to see the document as the index knows it.
>>>
>>> Thanks in advance,
>>>
>>> Itamar.
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message