lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Highligher Example
Date Fri, 08 Sep 2006 08:26:46 GMT
If you have a budget for this stuff then Stellent provide tools for parsing multiple document
types and also have a viewer that can display documents with their original formatting, plus
your highlights. See http://www.stellent.com/en/products/outside_in/viewer_tech/index.htm

I don't work for Stellent and haven't used it but I do know this stuff is hard to do and they
are the only ones I'm aware of trying to provide tools to cover all document types which is
why I mention it. If anyone has any other similar recommendations I would be interested to
hear them.


----- Original Message ----
From: Mark Miller <markrmiller@gmail.com>
To: java-user@lucene.apache.org
Sent: Friday, 8 September, 2006 2:02:47 AM
Subject: Re: Highligher Example

Highlighting a PDF document, last time I looked (quite a while ago), 
involves supplying an xml file that describes offsets for highlighting. 
You can specify the file in the URL. You can also do simple highlighting 
by passing in a list of words to be highlighted, but this does not even 
catch minor differences, like singular to plural.

If someone knows more about using to the lucene highlighter to highlight 
PDF's then please speak up. I think I will have to get into this soon.

- Mark

Mag Gam wrote:
> Thanks for the quick response Erik. I will be getting my LIA book back 
> very
> soon, I forgot it at a destination :-(
>
> Lets assume, there is a document called "hello.pdf" and it has the 
> content
> "this is hello.pdf. It uses Acrobat"
>
> When I perform a search for "Acrobat", i want hello.pdf to show up, 
> and also
> the 'It uses <highlight>Acrobat</highlight>'
>
> something like that.
>
> tia
>
>
>
> On 9/7/06, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>>
>> There are test cases in the Highlighter codebase that exercise it and
>> show its use, as well as a few examples of it in the "Lucene in
>> Action" codebase.
>>
>> These examples output plain text with some prefix and suffix
>> surrounding the highlighted terms.  Highlighting text in a PDF is
>> possible, I'm pretty sure, but I don't think the same would be easily
>> possible with Microsoft document formats.  I'm not sure if you are
>> asking for these document types to be highlighted or just a plain
>> text representation of them, though.
>>
>>         Erik
>>
>> On Sep 7, 2006, at 6:37 PM, Mag Gam wrote:
>>
>> > Hey
>> >
>> > Anyone have a search result highlighter example?
>> >
>> > I have various doc, PDFs, DOC, TXT, PPT, and I would like to show a
>> > highlight, similar to how google does it...
>> >
>> > tia
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message