incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Woodward <m...@mattwoodward.com>
Subject Best Way to Handle Inserting HTML Data
Date Wed, 17 Nov 2010 00:46:33 GMT
I have a SQL Server database I'm converting to CouchDB. I wrote a script
that pulls data form SQL Server, denormalizes into the JSON structure I want
to have in CouchDB, and then makes an HTTP call to insert each record.

The issue I'm running into is some of the fields in SQL Server have HTML in
them, and I'm not sure the best way to handle this. It's easy enough to
brute-force escape all the HTML tags, quotes, etc. but then the data won't
display as HTML when I want to display it on a web page.

The more I thought about it, I think the only character that will really
cause issues with the JSON (other than high-ASCII MS Word junk, which I'm
stripping out/converting) is ". Here's an example:

myJSONString = '{"foo":"bar", "baz":"<a href="http://foo.com">Here's a
hyperlink</a>"}';

So if I don't do any sort of escaping the first " in the href will cause the
JSON to be invalid. But whereas I can change " to &quot; in regular text
(e.g. "Here's a quote" can become &quot;Here's a quote&quot; and still be
fine), that won't really work with hyperlinks since <a href=&quot;
http://foo.com&quot;> won't work properly I don't believe, or at any rate
isn't all that desirable.

If I escape the " with \ (using the example above, this becomes <a href=\"
http://foo.com\">) that lets me get the document into Couch successfully,
but I have a feeling that isn't going to work if I just want to display that
field as HTML on a web page (though I have yet to experiment with that to
see what happens when it's rendered).

Anyway, I've been going around in circles about how best to handle this and
since I'm new to Couch, I figured I'd ask to see what the conventional
wisdom/"best practice" is for something like this. I'm happy to go a
different route completely (write JSON to a file first and load using curl,
whatever works ...) if my current approach isn't the best way of going about
this.

Thanks!

-- 
Matthew Woodward
matt@mattwoodward.com
http://blog.mattwoodward.com
identi.ca / Twitter: @mpwoodward

Please do not send me proprietary file formats such as Word, PowerPoint,
etc. as attachments.
http://www.gnu.org/philosophy/no-word-attachments.html

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message