GabrieLinux ht://CheckING

© 1999-2004 - Comune di Prato - Italia
Some portions © 1995, 2000 - The ht://Dig Group
Some portions © 2008 - Devise.IT
Distributed under GNU GPL


General Info

ht://Check is more than a link checker. It's a console application written for GNU/Linux systems in C++ and derived from the best search engine available on the Internet for free: ht://Dig.

It can retrieve information through HTTP/1.1 and store them in a MySQL database, and it's particularly suitable for small Internet domains or Intranet.

Its purpose is to help a Webmaster managing one or more related sites: after a "crawl", ht://Check creates a powerful data source made up of information based on the retrieved documents. The kind of information available to the ht://Check user includes:

  • single documents attributes such as: content-type, size, last modification time, etc.;
  • information regarding the retrieval process of a resource, like for instance whether the resource was succesfully retrieved, or not, showing the various results (the so-called HTTP status codes, as ht://Check uses this protocol for crawling the Web);
  • information regarding the structure of a document, basically its HTML link tags, and the relationships they issue, in a whole process view: basically, ht://Check is able to crawl a Web domain or set (in the algebrical meaning), and links create sort of inter-documents relationships in it. This feature, allows the user to get further information from the domain regarding:
  • link results: if it either working or broken or redirected; also at the current status, it checks whether a link is actually an anchor that does not work, or it is a javascript or an e-mail;
  • the relationships between documents, in terms of incoming links and outgoing ones; in the future, particular attention in the development will be given to the Web structure mining activity.
  • A skinny report is given by the program htcheck, however at the current situation most of the information is given by the PHP interface which comes with the package and that is able to query the database built by the htcheck program in a previously made crawl. It goes without saying that you need a Web server to use it, and of course PHP with the MySQL connectivity module.

    By the way, as long as after a crawl ht://Check produces a database on a MySQL server, it's needless to say that every user theoretically could build its own information retrieval interface to this database; you only need to know the structure of it, its tables and fields, and the relationships among them. Other solutions are represented by independent scripts written by using common scripting languages with MySQL connectivity modules (i.e. Perl and Python), or faster programs written in C or C++ using MySQL API or wrapper libraries (such as MySQL++ or dbconnect), or other Web driven solutions like JSP, ColdFusion. There exists an interface to ht://Check for the Roxen Web server written by Michael Stenitzer (stenitzer@eva.ac.at).



    | Home Page | Download | Features | Documentation | Support | Screenshots | Thanks to ... | Uses | FAQ |

    Hosted by

    SourceForge Logo
    ht://Dig MySQL PHP The most famous penguin of the world

    ht://Check - More than a link checker - http://htcheck.sourceforge.net/
    © 1999-2004 - Comune di Prato - Italia
    Maintainer: Gabriele Bartolini - angusgb@users.sourceforge.net

    Italia