Playing with GoComics, Part 2


I’ve been helping one of the webcomic artists that has a strip on GoComics, and one of the things I found myself having to do was going back through several month’s worth of strips to find one specific comment posted by another user. It would be nice if GoComics provided a way to search comments directly, but they don’t. This pushed me over the edge on another VBScript project I’ve had at the back of my mind for a while now.

If you want the comments of a particular strip, you already have them embedded in the Javascript pushed into your browser. Taking the code from last week’s blog entry, just do “instr(objHttp.ResponseText, “<li class=’load-comments’>”)”. The comments are in the subsequent section.

Kind of. What GoComics did was to embed the first 15 comments as Javascript. You can do an “instr(objHttp.ResponseText, “Comments (“)” to get the actual comment count. If there’s more than 15 comments, you need to do another:

objHttp.Open “GET”, CommentsURL, False
objHttp.Send

This one’s a bit trickier, though. The directory containing all of the comments for a specific strip is a plain number apparently assigned at random. It’s going to look something like: “/comments/1161211/page/1?show_all=true”. You want that “1161211” number (or whatever it is for the page you’re on). Once you have that, then just build up the CommentsURL as:

Directory  = “1161211”
CommentsURL = “http://www.gocomics.com/comments/&#8221; & Directory & “/page/1?show_all=true”

This gives you all of the other comments that aren’t embedded in the page with the strip. Unfortunately, the comments embedded with the strip are formatted for Javascript, while the next set in objHttp.ResponseText will be in HTML format. This means having two sets of deformatters for stripping code out of the comments to make them more human-friendly.

What’s a real pain, though, is that certain comments can include Unicode characters, which VBS will barf on. And, the Microsoft VBScript Reference page fails to mention the existence of the ASCW() function, which is the one you want for getting the ASCII codes of Unicode characters. Use ASCW() to test for unicode and then replace it with something safe, like “.”, before writing the comments to an output file.

What I did was to create a while loop to run the script on every page of the comic strip from day 1 up to the present, stripped out the Javascript and HTML code so I just had the commenters’ names and the comments, and then saved the mass to separate .txt files, one file per strip (including the full URL of the strip in the file so I can go back to the original comments if needed). Now, if I have to locate someone’s comment again, I can grep one directory on my hard drive and be done with it in a couple seconds. Entire project took about 4 hours to write and debug only because I was tired and kept making stupid coding mistakes. Probably could have done it in an hour if I was rested.

Advertisements
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: