lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340" <peter.th...@navy.mil>
Subject Question on trying to Index and XML document...
Date Mon, 28 Sep 2009 09:12:09 GMT
With a basically default install of the trunk version of solr 1.4
when trying to index an xml file, it appears that the xml tags
seem to get stripped when indexed.
 
If the tag names and their frequenicies are important to me for search 
purposes could someone tell me what
my options are to not have solr strip out xml tags?
for example
 
if I have and xml tag of
<tag1> hello </tag1>
I'd like to see tag1 appear twice as a term and count as 2 is some
termFrequency vector.
 
I was trying out the examples from this link
http://wiki.apache.org/solr/ExtractingRequestHandler
 
and sending in an xml file.
 
Would I need to modify some exsiting code or is it just a configuration
to not strip out xml tags in processing?
 
-Peter
 
 
 
 
 
 

******************************************************************

Peter Thung

Software Developer

IBS Project Technical Lead -Web Developer

 

Code 56340  - Net-centric ISR Development Branch

Joint & National ISR Systems Division

Inteligence, Surveillance and Reconnaissance Department

US Navy Space & Naval Warfare Systems Center Pacific (SSC PAC)

Topside Campus, Bldg A33, room 0055

53560 Hull Street, San Diego, CA 92152

 

UNCLASS Email: peter.thung@navy.mil

SIPRNET Email: thungp@spawar.navy.smil.mil

COMM (Primary): (619) 553-6513

COMM (Secondary):(619) 553-0777

FAX: (619) 553-1586

******************************************************************

 

 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message