Packages and Binaries:
waybackpy
waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine’s APIs.
Internet Archive’s Wayback Machine has 3 useful public APIs.
SavePageNow API (also known as Save API)
CDX Server API
Availability API
These three APIs can be accessed via the waybackpy either by importing it from a Python file/module or from the command-line interface.
Installed size: 97 KB
How to install: sudo apt install waybackpy
Dependencies:
- python3
- python3-click
- python3-requests
- python3-urllib3
waybackpy
root@kali:~# waybackpy --help
Usage: waybackpy [OPTIONS]
_ _
| | | |
__ ____ _ _ _| |__ __ _ ___| | ___ __ _ _
\ \ /\ / / _` | | | | '_ \ / _` |/ __| |/ / '_ \| | | |
\ V V / (_| | |_| | |_) | (_| | (__| <| |_) | |_| |
\_/\_/ \__,_|\__, |_.__/ \__,_|\___|_|\_\ .__/ \__, |
__/ | | | __/ |
|___/ |_| |___/
Python package & CLI tool that interfaces the Wayback Machine APIs
Repository: https://github.com/akamhy/waybackpy
Documentation: https://github.com/akamhy/waybackpy/wiki/CLI-docs
waybackpy - CLI usage(Demo video): https://asciinema.org/a/469890
Released under the MIT License. Use the flag --license for license.
Options:
-u, --url TEXT URL on which Wayback machine operations are
to be performed.
-ua, --user-agent, --user_agent TEXT
User agent, default value is 'waybackpy
3.0.6 -
https://github.com/akamhy/waybackpy'.
-v, --version waybackpy version.
-l, --show-license, --show_license, --license
Show license of Waybackpy.
-n, -au, --newest, --archive_url, --archive-url
Retrieve the newest archive of URL.
-o, --oldest Retrieve the oldest archive of URL.
-N, --near Archive close to a specified time.
-Y, --year INTEGER RANGE Year in integer. [1994<=x<=9999]
-M, --month INTEGER RANGE Month in integer. [1<=x<=12]
-D, --day INTEGER RANGE Day in integer. [1<=x<=31]
-H, --hour INTEGER RANGE Hour in integer. [0<=x<=24]
-MIN, --minute INTEGER RANGE Minute in integer. [0<=x<=60]
-s, --save Save the specified URL's webpage and print
the archive URL.
-h, --headers Headers data of the SavePageNow API.
-ku, --known-urls, --known_urls
List known URLs. Uses CDX API.
-sub, --subdomain Use with '--known_urls' to include known
URLs for subdomains.
-f, --file Use with '--known_urls' to save the URLs in
file at current directory.
--cdx Flag for using CDX API.
-st, --start-timestamp, --start_timestamp, --from TEXT
Start timestamp for CDX API in
yyyyMMddhhmmss format.
-et, --end-timestamp, --end_timestamp, --to TEXT
End timestamp for CDX API in yyyyMMddhhmmss
format.
-C, --closest TEXT Archive that are closest the timestamp
passed as arguments to this parameter.
-f, --cdx-filter, --cdx_filter, --filter TEXT
Filter on a specific field or all the CDX
fields.
-mt, --match-type, --match_type TEXT
The default behavior is to return matches
for an exact URL. However, the CDX server
can also return results matching a certain
prefix, a certain host, or all sub-hosts by
using the match_type
-st, --sort TEXT Choose one from default, closest or reverse.
It returns sorted CDX entries in the
response.
-up, --use-pagination, --use_pagination
Use the pagination API of the CDX server
instead of the default one.
-gz, --gzip TEXT To disable gzip compression pass false as
argument to this parameter. The default
behavior is gzip compression enabled.
-c, --collapse TEXT Filtering or 'collapse' results based on a
field, or a substring of a field.
-l, --limit TEXT Number of maximum record that CDX API is
asked to return per API call, default value
is 25000 records.
-cp, --cdx-print, --cdx_print TEXT
Print only certain fields of the CDX API
response, if this parameter is not used then
the plain text response of the CDX API will
be printed.
--help Show this message and exit.
Updated on: 2024-May-23