Paper Scraper is an asynchronous Python tool that downloads images from Reddit, either from your saved posts or from any subreddits you specify.
It recognizes direct image links, Reddit image/gallery posts (i.redd.it, preview.redd.it), Imgur images/albums/galleries, and Flickr photos. Posts that don't resolve to a downloadable image are skipped.
- Python 3.13+
- uv
- A Reddit account with a registered script app
- An Imgur account with a registered app (used to resolve Imgur links)
- (optional) a Flickr API key, only needed to resolve Flickr links
-
Clone this repository:
git clone https://github.com/samlowe106/PaperScraper.git cd PaperScraper -
Ensure uv is installed, then create the environment:
uv sync
-
Create a Reddit app at your app preferences and choose script as the app type.
-
Create an Imgur app at your application settings.
-
Create a file named
.envin the project root with your credentials:REDDIT_CLIENT_ID="..." REDDIT_CLIENT_SECRET="..." IMGUR_CLIENT_ID="..." IMGUR_CLIENT_SECRET="..." FLICKR_CLIENT_ID="..." # optional, only needed to resolve Flickr links
After uv sync, run it via the paperscraper command (equivalently uv run python -m src.main):
uv run paperscraper [options]Some common options (run with --help for the full list):
| Flag | Description |
|---|---|
-r, --subreddit |
Include posts from a subreddit (repeatable) |
--sortby |
How to sort subreddit posts: hot, new, controversial, gilded, or top_all / top_day / top_week / top_month / top_year / top_hour |
--limit |
Max submissions to pull from each source (default: 10); an album submission may still yield several images |
-k, --karma |
Only download posts with at least this score |
--hours / --days / --years |
Only download posts at most this old (mutually exclusive) |
-d, --dir |
Output directory (default: Output) |
--organize |
Sort downloaded images into per-subreddit subfolders |
--nolog |
Disable the per-run JSON log (written into the output dir by default) |
-u, --saved |
Include your saved posts — prompts for Reddit login (see note) |
--unsave |
Un-save saved posts after a successful download (opt-in; requires login) |
# Download images from r/wallpapers, sorted by top of all time, into ./pics
uv run paperscraper -r wallpapers --sortby top_all -d pics
# Top posts from the last week with >= 100 score, at most 5 per subreddit
uv run paperscraper -r wallpapers -r art --sortby top_week --days 7 -k 100 --limit 5Files are written to a timestamped directory (e.g. Output/PaperScraper 2026-06-20 08:30/).
Note: the saved-posts flow (
--saved/--unsave) is implemented but has only been exercised against mocked Reddit responses — it needs a real login to verify end-to-end.--savedalso usesgetpass, so it needs a real terminal (not an IDE console).
The test suite uses pytest (with pytest-asyncio for the async code and vcrpy cassettes for recorded HTTP interactions).
uv run pytest # run the suite
uv run pytest --cov=src --block-network # with coverage, no live networkAs of the latest run, 139 tests pass with ~97% line coverage. CI runs the suite on the pinned Python version and fails the build if coverage drops below 80%.
This repo also ships a pre-commit config (ruff, black, mypy, and assorted file checks):
uv run pre-commit install # hooks will now run on every commit
uv run pre-commit run --all-filesAfter argument parsing, main() runs a fully asynchronous pipeline:
-
Stream building.
StreamBuilder(seesrc/reddit/submission_source.py) signs into Reddit via asyncpraw and turns the requested subreddits into async listing generators (capped per source by--limit). These are interleaved withmerge()and adapted withamap()/afilter()(seesrc/core/functional.py), yielding each submission as aSubmissionWrapper. A predicate built from--karmaand the age flags (--hours/--days/--years) filters out submissions that don't qualify. -
URL finding. Each
SubmissionWrapper.find_urls()runs every parser (single_image,reddit,imgur,flickrinsrc/parsing/) concurrently in a strategy pattern and collects the direct media links it can resolve. -
Downloading & saving. Resolved URLs are fetched with
httpxand written to disk withaiofilesviaUniqueDirectoryFileManager, which guarantees unique filenames and (with--organize) per-subreddit folders.
Concurrency is bounded by two asyncio.Semaphores — one for URL finding and a larger one for downloads — and the whole pipeline runs inside an asyncio.TaskGroup so submissions are processed as they stream in rather than in fixed batches. Each submission is handled independently: a failure is logged and skipped rather than aborting the run, and individual downloads retry transient errors with backoff. Unless --nolog is passed, a JSON record of each processed post is appended to a log in the output directory.
Paper Scraper is licensed under the MIT license.