next up previous contents index
Next: 2.3 Running the Parser Up: 2.2 Basic customization - Previous: 2.2 Basic customization -   Contents   Index

2.2.1 Specifying the pages you want to view - The Home Document

The home page is the first page you see when you start Plucker on your PDA, and is also the page you see when you tap on the home icon (the little house) in the viewer on the Palm OS hand-held device. The home page is by default created using the description file at $HOME/.plucker/home.html.

When you installed Plucker, a default description file was put in $HOME/.plucker. You can change this file in any text editor and it doesn't require deep knowlege of HTML to create it. We will explain how to do this step by step in a minute.

Except for the MAXDEPTH and similar attributes the description file is like any other HTML file. This also means that you can view home.html in your normal web browser (e.g. Netscape). In fact you can even use a normal web page at some web server as your home document. See chapter 4 for details on how to do this.

Prior to performing a HotSync, you have to tell Plucker about where to grab the pages that you want to view. Plucker starts by scanning the description file (also refered to as home document) that you have defined. As stated above if you did not define otherwise this will be $HOME/.plucker/home.html.

The parser finds any links in that file, and follows them. Each link (e.g. something like <A HREF="...">) is read from the Internet, stored and parsed on your hard disk and included in the database that you will later sync to your Palm. Let us explore the description file in more detail.

A simple, typical home document will look like this (without the linenumbers, they are only added for easier reference later on):

[01]	<H1>Plucker Home</H1>
[02]
[03]	<H2>Plucker Information</H2>
[04]	<P><A HREF="http://plucker.gnu-designs.com">
      Plucker home page</A><P>
[05]
[06]	<H2>Linux links</H2>
[07]	<A HREF="http://slashdot.org/index.pl?light=1&noboxes=1" 
      NOIMAGES MAXDEPTH=1>Slashdot.org</A><P>
[08]	<H2>News</H2>
[09]	<A HREF="http://channel.nytimes.com/partners/palm-pilot/summ.html"
      MAXDEPTH=2>New York Times</A><P>
[10]	<A HREF="http://www.news.com/Newsfeed/Avantgo/index.html" 
      MAXDEPTH=2>C-Net NEWS.COM</A><P>

Here you see several typical examples. First of all you may note that this document follows the general outlines of normal HTML as stated above. If you do not know HTML already, do not worry. What you need here is really very easy. Let us look it trough line by line. (If you already know HTML this will be somewhat boring to you.)

First you note that commands in HTML are enclosed in angle brackets. Each command has a begin and an end always using the same tag, the end marked by an additional slash in front of the tag's name. The first and 3rd row e.g. create Headlines. The numbers simply specify different fontsizes.

More interesting is the 4th line. First it starts a new paragraph (<P>) and then you see the most important tag in HTML: a link. Links have the following form:

<A HREF="http://plucker.gnu-designs.com">Plucker home page</A>

Enclosed in quotes you find the page they refer to (the URL) then after an closing angle bracket you see the title of the link that should be displayed to the user. You will see exactly this text within Plucker on your home document.

Now you have the basic procedure how to tell Plucker to get a specific web page for you. What does Plucker do if it finds a tag like this in home.html? It will simply follow the URL you specified and get this page. Note that as this page is plain HTML you can also view it within your normal web browser and use the links the same way as Plucker does it. That way you can easily check if all links are working as you expected without the need to run the parser each time and sync.

Well, it is nice to grab a web page, but what to do with a newsticker that lists only the headlines? You want to retrieve the articles as well. Let's have a look at line 9. You will note that this time the link is enhanced by an additional tag: MAXDPTH=2. This is the way to instruct Plucker to retrieve deeper levels from a web server. You can give MAXDEPTH any number as parameter. MAXDEPTH=2 means to load the target (linked-to) page, and any targets within that page. This will just do the job. It will first grab the headlines and then follow all linked pages to get the articles as well.

MAXDEPTH=3 will load the target page, its linked pages, and any pages linked within there and so on. You really do not want to set MAXDEPTH too high. That could be very bad. A MAXDEPTH of well under 50 would probably load the entire Internet, so you may run into some storage problems...

MAXDEPTH is one of the most important tags used to customize the information to download. Another important tag is NOIMAGES. A sample on how to use it can be found in line 7. If you do not want to download any images you should specify this tag. Simply add it after the URL to pluck. As you see in line 7 you can combine the various tags. That is line 7 instructs Plucker to download only the title pages without images. You see you can even explicitly specify a value of 1 to the MAXDEPTH argument. This does not make much sense at the first glance as our first example showed that leaving out the MAXDEPTH statement will give the same result. But stop. The reason to give a explicit MAXDEPTH is that you can define which depth is the default for Plucker within Pluckers configuration file we will talk about later on. So if you set the default depth to 2 e.g. but this page should definitly be plucked only to a depth of 1 you can specify it explicitly here instead of defining a depth of 2 for all other pages. (It is not a bad idea to define the depth explicitly for all pages as one can see at first glance how deep Plucker will work.)

Now in line 7 you see another possibility of Plucker. You might have wondered about these funny chars that appear in the URL. Well with that definition you instruct Plucker to request the result of a so called CGI and to pass some paraeters as well. A CGI is basically a script run by a web provider to gather explicit informations e.g. from a database. You do not need to worry about the details, the easiest way to get the correct URL is always to point your web browser to the page where the information is located and copy the URL from its URL-field into the home-document of Plucker. Here we just wanted to show that Plucker can even handle such funny URLs.

Hint: If you specify a CGI for a newsticker and you always get the same news it is most likely that the date of the issue is passed to the script via the URL.

NOTE: MAXDEPTH will most likely be sufficient for web pages that where written for PDAs. If you are plucking normal web pages be careful with MAXDEPTH, as many pages contain a menu to navigate the site and Plucker will follow these links as well. That is Plucker does not distinguish between a menu or a normal link. That way a MAXDEPTH=2 which is meant to download an overview page and the articles (e.g. of your newspaper) could easily result in Plucker trying to retrieve the news archive of your favourite newspaper as well (since it is linked from within the menu). So it is wise to attend the first runs of Plucker if you add new pages to your description file, especially if you do not use PDA-optimized pages. There are very effective ways to prevent Plucker from running into this kind of problems. Besides the MAXDEPTH and NOIMAGES tags there are various other tags that influlence the gathering process. E.g. you can have Plucker to exclude specific links etc. See below in chapter 4 for details about. We will not go into to much details here.

Hint: Web pages created for handhelds have some advantages over normal web pages. Plucker can retrieve normal web pages, no matter, but specially designed pages are normally smaller, contain less graphics etc.

Hint: Plucker can not handle frames. Usually, pages designed for handhelds do not contain frames...

Hint: A good collection of links to handheld friendly sites are included with Plucker.

Our German speaking users can find sites with mostly German content at http://www.palmtop-portal.de.


next up previous contents index
Next: 2.3 Running the Parser Up: 2.2 Basic customization - Previous: 2.2 Basic customization -   Contents   Index
The Plucker Team