lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog" <>
Subject RE: Large Data Set Suggestions
Date Sat, 08 Nov 2008 07:52:00 GMT
In my DIH tests I ran a nested loop where the outer RSS feed gave a list of feeds, and the
inner loop walked each feed. Some of the feeds were bogus, and the DIH loop immediately failed.

It would be good to have at least "ignoreerrors=true" the way 'ant' does. This would be set
inside each loop. Even better is standard programming language continue/break semantics. Example:

Outer ignoreerrors=continue
    Inner #1 ignoreerrors=ignore
	processing loop
    Inner #2 ignoreerrors=continue
	processing loop
    Inner #3 ignoreerrors=break
	processing loop
    Inner #4 ignoreerrors=break
	processing loop

After an error in an inner loop:
Inner #1 continues to its next item.
Inner #2 stops its loop but continues on to Inner #3
Inner #3 stops it loop AND Inner #4 does not run.

Inner #3 has to succeed. If it fails, this loop of Outer fails but Outer continues to its
next item.  Other cases: if Inner #2 is false, does Inner #3 get run? Perhaps instead of true
and false they could be ignore/break/continue where "ignoreerrors=break" means that an error
in Inner #2 would prevent Inner #3.


-----Original Message-----
From: Noble Paul നോബിള്‍ नोब्ळ् [] 
Sent: Thursday, November 06, 2008 8:39 PM
Subject: Re: Large Data Set Suggestions

Hi Lance,
This is one area we left open in DIH. What is the best way to handle this. On error it should
give up or continue with the next?

On Fri, Nov 7, 2008 at 12:44 AM, Lance Norskog <> wrote:
> You can also do streaming XML upload for the XML-based indexing. This 
> can feed, say, 100k records in one XML file from a separate machine.
> All of these options ignore the case where there is an error in your 
> input records v.s. the schema.  DIH gives up on an error. Streaming 
> XML gives up on an error.
> Lance
> -----Original Message-----
> From: Steven Anderson []
> Sent: Thursday, November 06, 2008 5:57 AM
> To:
> Subject: RE: Large Data Set Suggestions
>> In that case you may put the file in a mounted NFS directory or you 
>> can serve it out with an apache server.
> That's one option although someone else on the list mentioned that 
> performance was 10x slower in their NFS experience.
> Another option is to serve up the files via Apache and pull them via 
> Thankfully, there are lots of options, but we need to determine which 
> one will perform best.
> Thanks,
> A. Steven Anderson
> 410-418-9908 VSTI
> 443-790-4269 cell

--Noble Paul

View raw message