lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: .fdt file
Date Thu, 10 Jul 2008 15:05:06 GMT
On Thu, Jul 10, 2008 at 1:42 AM, blazingwolf7 <blazingwolf7@gmail.com> wrote:
> Well, I am trying to extract the URL and contentLength from the ".fdt" file.
> I am planning to use both of these values in a filter to remove certain
> links to be display in the search result. The problem is, I am told not to
> use the IndexReader to retrieve these values for each document found
> matching with the query.
>
> So now, instead, I will have to retrieve the entire .fdt file, extract both
> the values and store it into an arraylist which will be use later.  I am
> having problem extracting the entire file without using all the seek()
> method to determine the position of the document.
>
> Any suggestion?

You're trying to do things at too low of a level (bypassing Lucene's
public APIs)
I suggested earlier that you index the URL untokenized, and then use
the FieldCache.  That will allow you to easily retrieve a String[] of
all the URLs.

-Yonik


> Yonik Seeley wrote:
>>
>> On Wed, Jul 9, 2008 at 11:13 PM, blazingwolf7 <blazingwolf7@gmail.com>
>> wrote:
>>> Sorry,but I am still quite new to Lucene. What exactly is "cp"?
>>
>> The unix command for copy (hence the smiley).
>>
>> Some of your recent questions seem to be suffering from an XY problem:
>> http://www.perlmonks.org/index.pl?node_id=542341
>> You may get more help by explaining what you are trying to do.
>>
>> -Yonik
>>
>>> Yonik Seeley wrote:
>>>>
>>>> On Wed, Jul 9, 2008 at 9:01 PM, blazingwolf7 <blazingwolf7@gmail.com>
>>>> wrote:
>>>>> I had recently found out that Lucene will retrieve the content of a
>>>>> document
>>>>> from a file ".fdt". I am trying to retrieve the entire file in one go
>>>>> instead of retrieving it based on document number. can it be done?
>>>>
>>>> "cp" can retrieve the file on one go ;-)
>>>>
>>>> Other than that, the format is documented here:
>>>> http://lucene.apache.org/java/docs/fileformats.html
>>>> But I'm not sure why retrieving by document number won't work for you.
>>>>
>>>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/.fdt-file-tp18373913p18376301.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message