Pdf Books Hoover

This program is intended to automate the process of downloading books on the web that can only be downloaded page by page. It only works when the only thing that changes in the URL address from one page to another is the page number. The program first downloads all the pages and then merges them in a single pdf file.

You can only run the program on Linux or Cygwin and wget, pdf2ps and ps2pdf need to have been previously installed.

Usage: ./pdfbookshoover startpage endpage websiteaddress postdata(beginning) postdata(end) outputfile

Example: for the following address that gives you the first page of the book
http://www.tototutu.com/cgi-bin/cul.math/docviewer?did=008534102&seq=1&view=pdf

To get the first 20 pages, you type:
./pdfbookshoover 1 20 http://www.tototutu.com/cgi-bin/cul.math/docviewer? 'did=008534102&seq=' '&view=pdf' result.pdf
because the number after seq= represents the page number.
---!!!Do not forget to add simple quotes for the data to post!!!---


At the moment, it does not work behind proxies.
I don't take any responsibility for the use of this program. Run it at your own risk!
This program was developed by Géry Casiez (gery (dot) casiez (at) wanadoo (dot) fr).

[Download program] [Source code]

groquick


Last updated

Valid XHTML 1.0 Transitional Valid CSS!