incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Miller <m...@cloudant.com>
Subject Re: Best Way to Handle Inserting HTML Data
Date Wed, 17 Nov 2010 01:07:50 GMT
Can you just save the HTML as an attachment?  -Mike

On Nov 16, 2010, at 7:46 PM, Matthew Woodward <matt@mattwoodward.com>  
wrote:

> I have a SQL Server database I'm converting to CouchDB. I wrote a  
> script
> that pulls data form SQL Server, denormalizes into the JSON  
> structure I want
> to have in CouchDB, and then makes an HTTP call to insert each record.
>
> The issue I'm running into is some of the fields in SQL Server have  
> HTML in
> them, and I'm not sure the best way to handle this. It's easy enough  
> to
> brute-force escape all the HTML tags, quotes, etc. but then the data  
> won't
> display as HTML when I want to display it on a web page.
>
> The more I thought about it, I think the only character that will  
> really
> cause issues with the JSON (other than high-ASCII MS Word junk,  
> which I'm
> stripping out/converting) is ". Here's an example:
>
> myJSONString = '{"foo":"bar", "baz":"<a href="http://foo.com">Here's a
> hyperlink</a>"}';
>
> So if I don't do any sort of escaping the first " in the href will  
> cause the
> JSON to be invalid. But whereas I can change " to &quot; in regular  
> text
> (e.g. "Here's a quote" can become &quot;Here's a quote&quot; and  
> still be
> fine), that won't really work with hyperlinks since <a href=&quot;
> http://foo.com&quot;> won't work properly I don't believe, or at any  
> rate
> isn't all that desirable.
>
> If I escape the " with \ (using the example above, this becomes <a  
> href=\"
> http://foo.com\">) that lets me get the document into Couch  
> successfully,
> but I have a feeling that isn't going to work if I just want to  
> display that
> field as HTML on a web page (though I have yet to experiment with  
> that to
> see what happens when it's rendered).
>
> Anyway, I've been going around in circles about how best to handle  
> this and
> since I'm new to Couch, I figured I'd ask to see what the conventional
> wisdom/"best practice" is for something like this. I'm happy to go a
> different route completely (write JSON to a file first and load  
> using curl,
> whatever works ...) if my current approach isn't the best way of  
> going about
> this.
>
> Thanks!
>
> -- 
> Matthew Woodward
> matt@mattwoodward.com
> http://blog.mattwoodward.com
> identi.ca / Twitter: @mpwoodward
>
> Please do not send me proprietary file formats such as Word,  
> PowerPoint,
> etc. as attachments.
> http://www.gnu.org/philosophy/no-word-attachments.html

Mime
View raw message