lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Murarka <ankit.mura...@rancoretech.com>
Subject Re: Possible location of word inside the file.
Date Thu, 04 Jul 2013 09:53:52 GMT
Thanks.Indeed I am indexing each file. But how do I index each line of a 
file.
This will essentially mean-> First I need to index each file to know 
whether the word exist or not. Then I need to index each line of the 
file to know them location. This does not seem to be a problem.

Problem is If I specify the file name to index, the file will be 
indexed. If i specify the directory name, all the file inside that 
directory will be indexed. But how do I go about indexing each line of a 
file.

Does this mean, get each line in file and feed it to lucene so that 
indexes can be generated. This will be very resource extensive as well 
as severly hit performance issue.

On 7/4/2013 2:04 PM, Ian Lea wrote:
> Sounds like you're indexing each log file as one lucene document.
> Obvious answer is to index each line in each log file as a separate
> doc.  Searches would then match lines in files and you can display
> those lines, summarizing counts per file if you want that,
>
> If you wanted to be able to show surrounding lines, index the line
> number and the file name.  So if you got a hit on line 12345 of file
> logabc.txt you could execute a second search with logfilename:
> logabc.txt AND lineno:[12340 TO 12350] to get 5 lines either side.
> Use a NumericField and NumericRangeQuery for lineno if you are
> concerned about performance.  See recent thread on this list for more
> on that.
>
>
> --
> Ian.
>
>
> On Thu, Jul 4, 2013 at 8:10 AM, Ankit Murarka
> <ankit.murarka@rancoretech.com>  wrote:
>    
>> Dear Team,
>>                   I have a potential usecase. I have large number of log
>> files which are archived in a particular directory. Now the administrator
>> would like to view certain information which might/might not be present in
>> any of the files inside the directory.
>>
>> Using lucene, I was able to get whether the specific word he is searching
>> for is present in the files or not and in which files they are present.
>>
>> BUT, is it possible to find the location of that word inside the file. Each
>> file is about 5 MB and does not really make sense to parse the file to know
>> the location of a certain word which is present.
>>
>> Can lucene help in this regard? Or atleast a close approximation of its
>> location in the file. I would be wishing to show atleast 256KB of data from
>> the point that word is present int he file.
>>
>> Googled a lot but to no avail.
>>
>> --
>> Regards
>>
>> Ankit
>>
>> "Peace is found not in what surrounds us, but in what we hold within."
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>      
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>    


-- 
Regards

Ankit Murarka

"Peace is found not in what surrounds us, but in what we hold within."


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message