Tuesday, September 14, 2010

Woofy tutorial

Woofy: how to setup download script.

Tools used:

Notepad ++ (You can use just Notepad if you prefer)

Expresso

Woofy

This is a tutorial explaining how to create the xml file required for allowing Woofy to download all the comic strips from a particular webcomic. The example webcomic I’m using is Nerf now.

To start heres a basic template for a web comic definition:

<?xml version="1.0" encoding="utf-8" ?>
<comicInfo friendlyName="">
<startUrl><![CDATA[]]></startUrl>
<firstIssue><![CDATA[]]></firstIssue>
<comicRegex><![CDATA[]]></comicRegex>
<backButtonRegex><![CDATA[]]></backButtonRegex>
</comicInfo>

To start enter the name of the web comic inside of the quotes beside friendlyName. One thing that should be noted is that it uses this for the file name this is fine but it can’t download to a file with a space in it. My suggestion is that you don’t use spaces in the name.

<comicInfo friendlyName="NerfNow">

After this you have to enter the start url which is the base url. An example is shown below.

<firstIssue><![CDATA[http://nerfnow.com/]]></firstIssue>

Next up is the link to the first comic.

<firstIssue><![CDATA[http://nerfnow.com/comic/4]]></firstIssue>

After this it becomes a bit more complicated as you have to define search terms with expressions. The tool I use is the same one as the creator of Woofy which is Expresso. The only reason why we need this is mostly to make sure that the search terms with work. A basic tutorial for how expressions work is here. The first thing were using it for is the search terms for the comics name. Which for Nerf now should look like what is shown below.

<comicRegex><![CDATA[http://nerfnow.com/comic/image/[0-9]{1,}]]></comicRegex>

After this we need to setup a search term for finding the back button on the webpage. One thing that I would like to note that is important is putting brackets around the link and the ?<content> I’m not exactly sure what it does but from my understanding it tells Woofy that its the link or that its the final portion of the link depending on were you place it. For this example it is showing Woofy that its the entire link.

<backButtonRegex><![CDATA[<a\shref="(?<content>http://nerfnow\.com/comic/[0-9]{1,})">Previous]]></backButtonRegex>

Finally for Nerf now we need to set a renaming parameter because the comic downloads without a extension. This is a fairly simple one that uses the original name and then adds the proper extension for it.

<renamePattern><![CDATA[${fileName}.png]]></renamePattern>

Finally this comic wasnt the best example but its a pretty good example. If you want to look at the tutorial made by the programs author you can look at it here.

Lastly I would like to explain the contest a bit better. The contest will be running 4 weeks from last saturday and the winners will be emailed there prizes. A post will be made stating the contest is over.

No comments:

Post a Comment