Steve:
Thanks for the reply. I posted my inquiry here because it didn't seem to be
a java-only issue, as such, and I didn't want to cross-post.
Brad
Steven Rowe wrote:
>
> Hi Brad,
>
> Brad Harper wrote:
>> The use case involves so-called print streams. Imagine 20,000 statements
>> concatenated into one large file suitable for delivery to a print system.
>> The document formats vary, but include AFP (an IBM printer format), PCL
>> (an
>> HP format), Postscript, PDF, and even "plain-text".
>>
>> The indexing application must track the total page count of the embedded
>> statements. On a hit, the search application must extract and return the
>> [possibly multi-page] statement embedded within the larger print-stream
>> file.
>>
>> How would the search application know (be informed by the Lucene/indexer)
>> the extent of the internal document(s)?
>
> You'll get faster/better responses to questions like this if you direct
> them to the java-user list.
>
> One solution is to use a Lucene stored field (call it "source")
> containing the name of the print stream file (stored, I assume,
> externally to the indexer), along with the document's extent within that
> file, maybe in a format like "filename:beg:end". Of course, you could
> also use three separate fields, one for each piece of information.
>
> Then when the search app gets a hit, the "source" field can be retrieved
> and consulted for the information you want.
>
> Steve
>
> --
> Steve Rowe
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
>
>
--
View this message in context: http://www.nabble.com/Investigating-Lucene-for-Applicability-to--Unusual---Use-Case-tf3917031.html#a11107033
Sent from the Lucene - General mailing list archive at Nabble.com.
|