lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Harper <brad.har...@epsiia.com>
Subject Re: Investigating Lucene for Applicability to [Unusual?] Use Case
Date Wed, 13 Jun 2007 19:25:42 GMT

Steve:

Thanks for the reply. I posted my inquiry here because it didn't seem to be
a java-only issue, as such, and I didn't want to cross-post.

Brad


Steven Rowe wrote:
> 
> Hi Brad,
> 
> Brad Harper wrote:
>> The use case involves so-called print streams. Imagine 20,000 statements
>> concatenated into one large file suitable for delivery to a print system.
>> The document formats vary, but include AFP (an IBM printer format), PCL
>> (an
>> HP format), Postscript, PDF, and even "plain-text".
>> 
>> The indexing application must track the total page count of the embedded
>> statements. On a hit, the search application must extract and return the
>> [possibly multi-page] statement embedded within the larger print-stream
>> file.
>> 
>> How would the search application know (be informed by the Lucene/indexer)
>> the extent of the internal document(s)?
> 
> You'll get faster/better responses to questions like this if you direct
> them to the java-user list.
> 
> One solution is to use a Lucene stored field (call it "source")
> containing the name of the print stream file (stored, I assume,
> externally to the indexer), along with the document's extent within that
> file, maybe in a format like "filename:beg:end".  Of course, you could
> also use three separate fields, one for each piece of information.
> 
> Then when the search app gets a hit, the "source" field can be retrieved
> and consulted for the information you want.
> 
> Steve
> 
> -- 
> Steve Rowe
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
> 
> 

-- 
View this message in context: http://www.nabble.com/Investigating-Lucene-for-Applicability-to--Unusual---Use-Case-tf3917031.html#a11107033
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message