next up previous contents index
Next: 5. Advanced Topics Up: 4.5.2 Pluck remote targets Previous: 4.5.2.1 Daily News   Contents   Index

4.5.2.2 Comics

We need some fun, too, so let's download a few strips for some well known comics. To simplify things we will use a tool called netcomics to get the comics and then use a local description file to build the database. How to install netcomics is beyond this tutorial, but it is a Perl script and might work on any platform that have Perl support (for Linux users there exists pre-built packages). After you have installed netcomics, you should create a small shellscript called netcomics.sh to be used by the parser,

#!/bin/sh

netcomics -D -d /tmp/Comics/ -c "ch dilbert dilbertcl uf"

( cd /tmp/Comics ; \
mv Dilbert-*.gif Dilbert.gif ; \
mv Dilbert_Classics-*.gif Dilbert_Classics.gif ; \
mv Calvin_and_Hobbes-*.gif Calvin_and_Hobbes.gif ; \
mv User_Friendly-*.gif User_Friendly.gif )

On OS/2 and Windows this will look like the follwing. On OS/2 it should be named netcomics.cmd whereas on Windows it should be named netcomics.bat:

perl netcomics.pl -D -d \temp\Comics\ -c "ch dilbert dilbertcl uf"

cd \temp\Comics
move Dilbert-*.gif Dilbert.gif
move Dilbert_Classics-*.gif Dilbert_Classics.gif
move Calvin_and_Hobbes-*.gif Calvin_and_Hobbes.gif
move User_Friendly-*.gif User_Friendly.gif

This script will download Calvin & Hobbes, Dilbert, Dilbert Classic and UserFriendly to a separate directory (/tmp/Comics/) and rename the date specific files into a general format that can be used in the local description file,

<HTML>
<BODY>

<H1>Comics Home Page</H1>

<A HREF="file:/tmp/Comics/Dilbert.gif">Dilbert</A><P>
<A HREF="file:/tmp/Comics/Dilbert_Classics.gif">Dilbert Classic</A><P>
<A HREF="file:/tmp/Comics/Calvin_and_Hobbes.gif">Calvin &amp; Hobbes</A><P>
<A HREF="file:/tmp/Comics/User_Friendly.gif">UserFriendly</A><P>

</BODY>
</HTML>

To simplify things even further we will also add a new section for the comics,

[comics]
bpp = 4
home_url = plucker:/HTML/comics.html
maxwidth = 600
maxheight = 200
db_file = DB/Comics
before_command = "netcomics.sh"

NOTE: On OS/2 or Windows you can use the before_command to the set the name of your batch file.

As you can see we have added the shellscript as a command that should be run before the description file is parsed. Everyday (except on Sunday when the strips are too large for these options -- we will show a solution to that later in the section) we now only have to run,

% Spider.py -v -s comics

Executing 'before_command': "netcomics.sh"
Working for pluckerdir /home/pilot/.plucker
Processing file:/home/pilot/.plucker/HTML/comics.html.
           0 collected, 0 still to do
  Retrieved ok
Processing file:/tmp/Comics/Dilbert.gif.
           1 collected, 3 still to do
  Retrieved ok
Processing file:/tmp/Comics/Dilbert_Classics.gif.
           2 collected, 2 still to do
  Retrieved ok
Processing file:/tmp/Comics/Calvin_and_Hobbes.gif.
           3 collected, 1 still to do
  Retrieved ok
Processing file:/tmp/Comics/User_Friendly.gif.
           4 collected, 0 still to do
  Retrieved ok

Writing out collected data...
Writing db 'Comics' to file /home/pilot/.plucker/DB/Comics.pdb
Converted file:/home/pilot/.plucker/HTML/comics.html
Wrote 1 <= plucker:/~special~/index
Wrote 2 <= file:/home/pilot/.plucker/HTML/comics.html
Wrote 3 <= plucker:/~special~/pluckerlinks
Wrote 11 <= file:/tmp/Comics/Calvin_and_Hobbes.gif
Wrote 12 <= file:/tmp/Comics/Dilbert.gif
Wrote 13 <= file:/tmp/Comics/Dilbert_Classics.gif
Wrote 14 <= file:/tmp/Comics/User_Friendly.gif
Wrote 15 <= plucker:/~special~/links1
Done!

To be able to use it also on Sundays we add yet another section to the configuration file,

[sunday]
bpp = 2
maxwidth = 550
maxheight = 400
db_file = DB/SundayComics

Using a lower bit depth for the images we are now able to include larger versions of the comics. Each Sunday we would run,

% Spider.py -s comics -s sunday

and since the parser applies the sections in the given order the changed values in sunday will override the ones in comics.


next up previous contents index
Next: 5. Advanced Topics Up: 4.5.2 Pluck remote targets Previous: 4.5.2.1 Daily News   Contents   Index
The Plucker Team