Python packaging: Why we can't have nice things - Part 3: Premature Compilation

Karl Knechtel

February 28, 2025

Pip 25.0 has been out for a bit over a month now; and we now also have an official blog post about the release, as well as a 25.0.1 patch for a regression.

Pip 25.0 has what I consider a very serious security vulnerability. In the Python ecosystem, it's normal and expected that third-party packages provide their own, arbitrary "setup" code for installation (for example, to run C compilers in project-specific ways, when the code uses a C extension). But Pip will run such code in many more situations than you might naively expect. I think it's obvious that running arbitrary code when you aren't expecting it and prepared for it is a much bigger problem. The user should have a chance to decide whether to trust the code, first.

I believe that warnings are more important than baiting people to read the post, so here's the PSA up front:

Never use Pip to download, test, "dry-run" etc. an untrusted source distribution (sdist). It will try to build the package, potentially running arbitrary code (as building an sdist always entails). Instead, use the PyPI website directly, or the API it provides.
Never use sudo to run Pip (nor run it with administrative privileges on Windows). Aside from the potential problems caused by conflicting with the system package manager, Pip will not drop privileges when it runs as root and attempts to build an sdist - which again, potentially runs arbitrary code.
If you expect wheels to be available for the packages you want to install with Pip, strongly consider adding --only-binary=:all: to the Pip command to ensure that only wheels are used. If you really need to use sdists, it's wise to inspect them first, which by definition isn't possible with a fully automated installation.
If you release Python packages, please try to provide wheels for them, even if - no, especially if your package includes only Python code and doesn't require explicitly "compiling" anything. An sdist is much slower to install than a wheel even in these cases, and making a wheel available allows your users to demand wheels from Pip - raising the overall baseline for trust and safety in the Python ecosystem.

Okay, I did clickbait a bit. This security issue isn't some new discovery. In fact, it has plagued Pip for its entire history.

Please enjoy my detailed analysis below.

Description and Demonstration

Let's say you want to install a package from PyPI that you aren't sure about. (For the purpose of this hypothetical, we'll use the issue7325 package that was created specifically for one of the many bug reports made about the general problem.)

Sure, the PyPI team strives to keep malware off the system, and there are plenty of eyes on big-name projects all the time; but nothing is guaranteed in this life.

You might suppose that you could just inspect the Python code before you ever try running (or importing) it, but that's only safe for a pure-Python wheel. If the project depends on a compiled C library, for example, then you won't be able to inspect it in a wheel (even if the wheel includes .c source code files, you can't verify that the compiled code actually corresponds to it). And if you install from an sdist, of course, Pip will try to build the package for you automatically, and you've heard (correctly) that this can run arbitrary code.

Since you don't want to allow unaudited, arbitrary code execution (to "get pwnt", as the kids probably still say), you hatch the plan of downloading the sdist first, so that you can manually unpack it first (it's just an ordinary .tar.gz file, after all), inspect it, and only then try the installation (knowing that you can specify the .tar.gz filename instead of a PyPI package name when you pip install).

You've just learned that Pip has a download command, so you try:

$ pip download issue7325

... and promptly get pwnt anyway. The next thing you know, you're getting a message (because this is white-hat hacking; no actual harm was done - this time...) every time you start Python in that environment:

WARNING: use of "pip download --no-deps" allowed arbitrary code execution
         see https://github.com/pypa/pip/issues/7325

There's also a --dry-run option for pip install which has the same problem. The only thing "dry" about a Pip dry-run install is the actual copying of files into the Python environment. It will still attempt to build a wheel by the normal process.

And therein lies the bug: Pip built the package, even though it was explicitly asked only to download that package.

(Apologies to the impatient, but there's much more I need to say before I can disclose why Pip does this.)

Of course, you can avoid this risk by demanding wheels:

$ pip download --only-binary=:all: issue7325

... which would fail in this case, because a wheel was deliberately not provided for demonstration purposes. Again: if you release Python code, please provide wheels if you can. If your code is pure Python, you have no excuse. If you use PyPA's standard build front-end (which I highly recommend), it will already make the wheel by default - all you have to do is include it when you upload to PyPI. If you don't have a dedicated build front-end, well first off you shouldn't be running setup.py directly (and if you don't use Setuptools for building then you almost certainly do have a build front-end), but pip wheel works in a pinch.

And, of course, there are other reasons why you might want to download an sdist. Maybe you know that there's some C code that needs a special patch for your system, or you want to edit some compiler options because you think you can optimize something. Or maybe you really need that package and the wheel just isn't available for your system. Or maybe the latest version isn't available as a wheel for your system yet, and you need that.

Whatever your situation, the safe way to get an sdist is from the PyPI website. Look up the package you want, click on "Download files", and select the file you want. You can get wheels this way, too. If you really need to use the command line, a JSON API is available:

$ curl -s https://pypi.org/pypi/issue7325/0.1/json | jq '.urls[0].url'
"https://files.pythonhosted.org/packages/c0/51/bd28cda650e3f0123ea82936f96b3fd28da90ec8b2af89a9029e25768647/issue7325-0.1.tar.gz"

(Here, "0.1" is the version; the urls array in the JSON data includes any sdists or wheels for that release in arbitrary order. Automation is left as an exercise.)

It Can Happen to You

I first noticed this issue as a result of a friend showing me a blog post from 2022 titled Someone’s Been Messing With My Subnormals!. Ostensibly, it's not about Pip at all. It's rather about what can happen to floating-point math in your Python program when C extensions are compiled a certain way and then included in the project, however indirectly. Specifically: there's a compiler option --ffast-math supported by both GCC and clang, which indirectly causes the compiled code to mess with global process state, which changes the behaviour of "subnormal" floating-point values. I think I have a reasonably good understanding of the technical details involved in this -- but they don't normally concern me, so I don't spend a lot of time thinking about them.

So why am I bringing this up, you ask? Well, it turns out that this story has a buried lede. Because the floating-point math problem involves global process state, you can trigger it simply by having specific dependencies - even transitively - in your project. In order to verify how widespread the problem is, and figure out which packages most commonly cause downstream problems, the author determined that it would be necessary to examine every package on PyPI. And because of how poorly the ecosystem handles package metadata (more on that another time), the natural way to do a properly thorough job of that would be to download every package, and then try to scrape metadata out of them (which might be represented a few different ways - depending on whether a wheel is available, and whether the package includes pyproject.toml and/or PKG-INFO per modern standards).

And, well, that's where all hell broke loose:

I actually started down this path and set about running pip install --dry-run --ignore-installed --report on all 397,267 packages. This turned out to be a terrible idea. Unbeknownst to me, even with --dry-run pip will execute arbitrary code found in the package's setup.py. In fact, merely asking pip to download a package can execute arbitrary code (see pip issues 7325 and 1884 for more details)! So when I tried to dry-run install almost 400K Python packages, hilarity ensued. I spent a long time cleaning up the mess, and discovered some pretty poor setup.py practices along the way. But hey, at least I got two free pictures of anime catgirls, deposited directly into my home directory. Convenient!

Once I had managed to clean up the mess (or hopefully, anyway—I never did find out what package tried to execute sudo), I decided I needed a different approach.

(Editor's note: Yes, that last link does include the catgirl pictures in question.)

So, to reiterate from the introduction: the arbitrary setup code included with an sdist can be run even for innocuous sounding "download" commands.

And, again: I don't fault Python for relying on arbitrary code at install time in general. The requirements to set up a Python project are pretty well arbitrarily complex, and nobody has really put forward a system that reliably handles even the common cases in any secure manner - at least, aside from pure Python projects where there's nothing to build. The same problem is also seen in other packaging systems for other languages, like NPM. (Here's just one of many articles on that topic I found with a quick search.) And, of course, if you're going to use an installed library, it can run arbitrary code at import time, or when you call any of its functions. That's just how it is with third-party code: ultimately, trust has to come from somewhere.

But the entire point of having a command like pip download is so that Pip's resolver can figure out which package is appropriate for your system and then just download it for you, which you'd typically do specifically so that you can inspect it before doing anything with it. (After all, you can't just rely on reading the code on GitHub etc. in general - there's no guarantee that code actually matches what you downloaded. There's a new system to make that possible, but publishers have to opt in to it.) Or maybe you want to store it somewhere, perhaps as part of setting up your own index. But regardless, you aren't trying to install it yet.

The above quote uses the only red text in the entire article, and is also, as far as I know, the main reason it got as much attention as it did. True, not all of those packages were actually downloaded; and of course a lot of them would have been available as wheels. So no, our author did not exactly run 397,267 pieces of untrusted code unintentionally.

But still, I can't pass on the opportunity to make the reference:

That is not a small number!

There's something else I need to point out here. The author of that post, Brendan Dolan-Gavitt (@moyix) is not just some random C expert who read the Pip documentation (but not thoroughly enough). No, Brendan Dolan-Gavitt is a security researcher with an impressive publication history going back to at least 2006.

Yeah.

Again: do not use Pip to download sdists for examination. Instead, go to the actual PyPI website, find the page for the package you want, optionally choose a version from the "Release history" (manually determining what version you want), and choose the "Download files" option; or use the JSON API.

I don't know of any official, ready-made, secure automation for using the JSON API for this task. If you decide to implement a solution, please share and promote it.

Using the website interface is also, arguably, the best way to protect yourself against typo-squatters and other malware packages - on top of the PyPI maintenance team's own attempts to remove those projects.

Let's Make Things Silly

While I'm thankful to Wim Jeantine-Glenn for creating an example (for Pip issue 7325) that demonstrates the problem in a reasonably realistic (but minimal) way, in my opinion it really doesn't show off how absurd this all is.

With that in mind, I prepared the following Bash script you can use to reproduce the problem on Linux - quickly (less than a second on my 10-year-old machine), directly and without an Internet connection. All you need is for pip to refer to a working copy of Pip. It's also written to highlight many things that might otherwise not be obvious about the nature of the problem.

#!/bin/bash
# Copyright (c) 2025 Karl Knechtel.
# Permission is granted to reproduce this code locally for testing purposes,
# but please don't republish or redistribute it - instead, please direct
# interested readers to this blog post at
# https://zahlman.github.io/posts/2025/02/28/python-packaging-3/ .
mkdir demo-0.1.0 # [1]
cat << done_toml > demo-0.1.0/pyproject.toml # [2]
[project]
name = "demo"
version = "0.1.0"
dependencies = []
[build-system]
requires = [ ]
build-backend = "build"
backend-path = "."
done_toml
cat << done_info > demo-0.1.0/PKG-INFO # [3]
Metadata-Version: 2.4
Name: demo
Version: 0.1.0
done_info
cat << done_setup > demo-0.1.0/build.py # [4]
__import__('sys').exit("Arbitrary code could have been executed here.")
done_setup
tar czf demo-0.1.0.tar.gz demo-0.1.0/ # [5]
pip download --no-deps --no-build-isolation ./demo-0.1.0.tar.gz # [6]
rm -r demo-0.1.0/ demo-0.1.0.tar.gz

Footnotes from the code:

The general approach is to create a valid sdist - fully compliant with all up-to-date standards - locally, and then ask Pip to "download" the file. Yes, this is a perfectly valid (if pointless) use of pip download, as the output will make clear. It's actually pretty easy to create such an sdist - it's just a zipped (or should I say Zzzzzzzzzzzzzzzipped?) tar archive, containing "source" metadata in the form of pyproject.toml and "built" metadata in the form of PKG-INFO. Note in particular that the folder name includes the name and version for the project - that's part of the expected structure for the sdist.
Here we create a pyproject.toml file following the appropriate standards. We have a [project] table (originally defined by PEP 621) defining name, version and dependency information. Only the name and version are mandatory - and the version could be marked as "dynamic" instead - but that doesn't make sense for our use case. The dependencies would default to an empty list, but it's more amusing IMO to be explicit about this. We also have a [build-system] table (originally defined by PEP 518) explaining what tools to use to create a wheel from the sdist. (Normally, the same tool would be used to create the sdist from a source tree - but in this example, the "build system" we specify is a fake.)
Here we create the corresponding "core metadata" PKG-INFO file. Normally this would end up copied verbatim into any corresponding wheel (named METADATA inside wheels), unless the project uses dynamic metadata. In order to conform to modern standards, we need to implement at least version 2.2 of the metadata spec - but it turns out that we can trivially implement version 2.4, the most recent. Updates to the spec generally allow us to add information, but don't require it - for example, version 2.4 allows for using license files in an up-to-date way - but our example project, being ephemeral, doesn't have its own license. As with pyproject.toml, only the name and version (and the metadata version) need to be specified.
Here we define a fake "build system" for the sdist, which just immediately errors out. The name build.py corresponds to what was defined in pyproject.toml. The main thing I want to highlight here is that Setuptools has nothing to do with the problem. For projects that use Setuptools (the default if you don't include a [build-system] table), Pip would tell Setuptools to build the project, and Setuptools would (among other things) potentially run a top-level setup.py script in order to do so. But this is purely an implementation detail. Setuptools is, in these cases, only doing what it's told. It's entirely Pip's fault that Pip tells Setuptools to do this.
Finally we can create the sdist. Notice in particular that the only actual Python code included is the "build system". As described in part 1, it's perfectly valid for the sdist - as well as any resulting wheel - to define no installable code packages at all. The name demo applies to the distribution - not to anything that will be imported by users. Anyway, we name the file with the distribution name and version, according to rules defined in PEP 625 - there's no wiggle room here.
Now that we have an sdist, we can tell Pip to "download" it, and then we'll clean up by deleting the sdist archive and the corresponding folder. Pip won't actually modify any site-packages contents, but it will try to build the sdist into a wheel. Note in particular the --no-deps and --no-build-isolation flags here, for later.

When you try this, you should get a result like:

Processing ./demo-0.1.0.tar.gz
  File was already downloaded /<absolute path omitted>/demo-0.1.0.tar.gz
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      Arbitrary code could have been executed here.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

The Arbitrary code could have been executed here. message, of course, comes from build.py - it's not a warning from Pip.

Perhaps the funniest part here is that there are two disclaimers of responsibility from Pip. These are standard messages, and normally make sense - when Pip tells the build system to build a wheel, it can't do anything about bugs in the build system itself, nor about errors in the project's build configuration. But, of course, it is a problem with Pip in this case that a build is attempted at all.

It does this even though the file is already right there in the current directory, and Pip knows that it's right there (File was already downloaded) and simply uses the existing file directly.

It does this even though we explicitly told it that we only want to "download" the code, not to build nor install it.

It does this even though we explicitly told it that we don't want to obtain any project dependencies (--no-deps). (The --no-build-isolation flag is more just for entertainment. It's perfectly valid to include this flag for pip download and that it does something meaningful - although not relevant to the demo. Normally, when Pip starts a build process, it would create a temporary venv for it, and install the build system's dependencies there - along with the build system itself, if not included. Since we include our fake "build system" and it has no dependencies of its own, we save some time by asking Pip to just build in the current environment. This build system isolation, by the way, often results in needing an Internet connection to build projects even though you already have everything necessary installed - a topic for another day.)

It does this with every version of Pip that's compatible with currently supported versions of Python, and would do it with much older versions as well - going back for almost the entire history of Pip, adjusting for UI tweaks and changes to standards made along the way.

It does this even though we follow every modern packaging standard to the letter. Including some updates that were specifically intended to facilitate Pip in avoiding unnecessary builds of this sort.

The Big Reveal

Dear reader, can you guess why, exactly, Pip is starting this build process, and running arbitrary code without oversight? I've made some vague allusions to it already, but you might still never guess.

It's so that Pip can make sure that the name and version metadata that you'd get from building the project, match what you requested.

Yes, really.

The name and version already present in the filename per PEP 625.

The name and version already present, and matching, in the top-level folder name of the archive, per the sdist standard.

The name and version already present, and matching, in the pyproject.toml file, per PEP 621.

The name and version already present, and matching, in the PKG-INFO file, per the core metadata specification.

That name and version.

It's not about dependencies - we explicitly specify --no-deps in this test, and I've traced the code and verified that it passes information all the way along the chain to the effect of "we don't need to find out what the dependencies are when we build this project and get the metadata". (If we add a dependency specification, Pip won't try to download it, unless we remove the --no-deps flag.)

Now, it could be argued that, once upon a time, most of that information wasn't reliable. After all, it wasn't until fairly recently that we actually had PEP 625 fully implemented in Setuptools, even though it pretty much matches what Setuptools was already doing - well, aside from the handling of hyphens and underscores, I guess. (And Pip won't complain if you try to install a local file with a non-conforming filename, either. After all, there theoretically are a few old, pre-standard projects already up on PyPI that don't conform, even though Setuptools would have usually generated conforming names back then, and even though Setuptools used to be the only game in town.) And it's not like an installer really needs to care about the name of that top-level folder; it can just extract the archive.

And there's no guarantee that build systems actually implement PEP 621 (when creating the sdist), either. It's the official stance of the Pip development team, accordingly, that "tools should not read metadata from pyproject.toml" - which makes sense as long as the core metadata specification is still a thing. Notably, Poetry didn't implement PEP 621 until last September). Besides, Pip still supports legacy setup.py-based builds, therefore sdists aren't actually required to contain a pyproject.toml at all - even though this file is part of the official specification of the "source distribution format".

But a PKG-INFO file is - and, as far as I can tell, always has been - supposed to be canonical metadata. If the build system adds it to the sdist, anything that's actually in there is supposed to be authoritative, and not subject to change when creating a wheel (or an egg, to consider legacy processes). The specific purpose of the version 2.2 update to the core metadata format was to make sure that this file provides reliable name and version info. Those values must now be listed explicitly and cannot be marked as "dynamic" (i.e., to be calculated when creating a wheel) - they must be determined at sdist creation time (i.e., tools that compute a version number from source control history, need to do that when making an sdist from the repository). But as it stands, Pip doesn't even check if the file is present.

And Pip is, in a meaningful sense, supposed to leverage PEP 625 - since it was written by a Pip developer specifically to avoid this headache, with the expectation that it won't cause a real problem. Quoting from the PEP:

The filename contains the distribution name and version, to aid tools identifying a distribution without needing to download, unarchive the file, and perform costly metadata generation for introspection, if all the information they need is available in the filename.

...

Currently, tools that consume sdists should, if they are to be fully correct, treat the name and version parsed from the filename as provisional, and verify them by downloading the file and generating the actual metadata (or reading it, if the sdist conforms to PEP 643). Tools supporting this specification can treat the name and version from the filename as definitive. In theory, this could risk mistakes if a legacy filename is assumed to conform to this PEP, but in practice the chance of this appears to be vanishingly small.

(Just to emphasize: PEP 625 standardizes a file naming convention. It took over two years to approve, and almost four years in total to see its final implementation in Setuptools - never mind any other build backends. And Pip still isn't able to take advantage of it, half a year after that.)

Besides, if you're "downloading" a local file, then surely you shouldn't need to check the name and version like this. If you're asking to get a file from PyPI (or another index), meanwhile, you're already requesting it by name and version - and it should be the index's responsibility to ensure that it gives you the right file. Even if you don't trust the index (why are you using it, then?), nothing in the Pip command we're using actually asks to verify that the download correctly represents the name and version requested.

In the End, it Doesn't Even Matter

Let's try this another way. Like I said, all of this name and version info is supposed to match up. But what if it doesn't? Of course, it's not very interesting if we intentionally break the build process like before, so to demonstrate, we'll need a setup that can actually create a valid wheel.

To do this, I'll specify Flit's build backend, flit-core, as the one to use. (It's available separately; you don't need the Flit tool suite installed to use it.)

Here's the modified script:

#!/bin/bash
# Copyright (c) 2025 Karl Knechtel.
# Permission is granted to reproduce this code locally for testing purposes,
# but please don't republish or redistribute it - instead, please direct
# interested readers to this blog post at
# https://zahlman.github.io/posts/2025/02/28/python-packaging-3/ .
mkdir demo_a-0.1.0 # [1]
cat << done_toml > demo_a-0.1.0/pyproject.toml # [2]
[project]
name = "demo-b"
version = "0.2.0"
dependencies = [ ]
description = ""
[build-system]
requires = [ "flit-core" ]
build-backend = "flit_core.buildapi"
done_toml
cat << done_info > demo_a-0.1.0/PKG-INFO # [3]
Metadata-Version: 2.4
Name: demo-c
Version: 0.3.0
done_info
touch demo_a-0.1.0/demo_b.py # [4]
tar czf demo_d-0.4.0.tar.gz demo_a-0.1.0/ # [5]
pip download ./demo_d-0.4.0.tar.gz # [6]
rm -r demo_a-0.1.0/ demo_d-0.4.0.tar.gz

Notes:

We'll still make the same pyproject.toml as before, and we'll describe a name of demo-a and version 0.1.0 for the top-level folder. But our pyproject.toml dictates a name of demo-b and version 0.2.0.
Aside from specifying the build system, the new pyproject.toml also provides an empty project.description (because Flit will insist on having one, even though the standards don't technically require it).
We make a PKG-INFO "built metadata" file as before, and here we'll specify a third conflicting name of demo-c and version 0.3.0.
Although the standard (as far as I can tell) allows you to distribute wheels with no Python modules or packages in them, Flit insists on a top-level name (because it only does automatic discovery of whether or not you use src layout, and thus doesn't offer explicit configuration options for what packages are present). So we create an empty demo_b.py to avoid an error from Flit. (Note that the Python filename uses an underscore so that it would hypothetically work with import; but distribution names are allowed to contain hyphens - which are then normalized to underscores in filenames, so that it's clear where the name ends and the version begins.)
We set up the final conflicting name, demo-d, and version, 0.4.0, in the filename for the sdist.
We "download" the sdist as before, and then clean up as before.

See how much you can guess about what will happen before proceeding.

On my system, the results look like:

Processing ./demo_d-0.4.0.tar.gz
  File was already downloaded /<absolute path omitted>/demo_d-0.4.0.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Successfully downloaded demo-b

Now, to some extent it's Flit's choice that we end up with demo-b as the reported "successfully downloaded" package. Normally, the sdist provided to flit-core would also have been created by flit-core, so the mismatch between the pyproject.toml and PKG-INFO couldn't happen. Flit's choice, apparently, is to ignore the PKG-INFO file completely and re-create METADATA based on the pyproject.toml contents.

Notably, though, the file names aren't used: flit-core can't see the name of the original tarball, nor the name of the top-level folder in that tarball - following the standard, the build system is given already-unpacked source, and runs within that directory. Although the information in these filenames is supposed to be reliable following PEP 625, not only does Pip not use it, but the build backend can't use it (because the build frontend doesn't provide it).

But the real reason I'm showing you this, of course, is because Pip doesn't report any error. Pip receives an sdist with all sorts of contradictory information, goes out of its way to invoke flit-core to build a wheel, gets a wheel for a demo-b package that doesn't match the original filename... and then it doesn't care. This result is, apparently, a "successful download". Oh, and that wheel is ultimately discarded - it doesn't show up in pip cache list afterward. Ultimately, the result of all that work was just to make the output say Successfully downloaded demo-b rather than Successfully downloaded demo_d. (Of course, demo-d and demo_d are both valid distribution names, so the filename doesn't unambiguously represent the distribution name.)

(But considering that we already had the file, would it be any less ridiculous for Pip to say the download was unsuccessful?)

History

When I started writing this post, I knew that this issue had existed for a long time, but I wasn't completely sure when it had been introduced. I did know that PyPI hosts versions of Pip going all the way back to 0.2, which is the first to bear the name "Pip" (having been originally named "pyinstall"). Since I was lagging behind on editing, I decided I might as well do a bit more research. With some difficulty (and perhaps a story for another time), I managed to set up Pip 0.2 in a separate environment for testing.

Of course, it requires Python 2.x, and the command syntax has changed over the years, and so have the packaging standards. And in those days, Pip explicitly had Setuptools as a dependency.

But I did eventually manage to create a compatible test project (with actually a surprisingly recent version of Setuptools), and try out a "download" using pip install --no-install (yes, that really was the syntax, to "Download and unpack all packages, but don't actually install them".

And just as I expected, the "download" and "unpack" process would still "build" the project separately, and therefore run arbitrary code from setup.py.

So there you have it - the problem has existed for Pip's entire history, over 16 years.

Just to round things off, I've assembled a timeline of relevant events, thus:

October 2009

Version 0.5 adds the --no-deps and --download flags for pip install. The --download flag actually specifies a directory where downloads should be put (the canonical name is --download-dir per the changelog), but it implies --no-install when set. (Given the wording in future bug reports, one would be forgiven for assuming that the feature of downloading without full installation was added here, but it wasn't.)

July 2011

It's reported in Pip issue 315 that pip install --download won't download dependencies, even without specifying --no-deps. (The --download flag for pip install was available since at least 0.6, and --no-deps since 0.5; but the earliest versions are not dated in the changelog, and there's no clear indication of exactly when the download feature was added. PyPI doesn't )

January 2012

It's pointed out in Pip issue 425 that Pip runs arbitrary code from setup.py in a separate context - the risk of the connection to PyPI being spoofed. (In those days, SSL and HTTPS were not nearly so plug-and-play as they are now.)

February 2012

Pip v1.1 is released, fixing the issue reported in July 2011.

September 2012

Pip issue 661 is the first I can find that directly describes the problem. The issue was deemed a problem with the setup.py of the to-be-downloaded package, added to that package's issue tracker almost two years later, and finally closed in 2017 after no further commentary. A salient comment:

The root problem here is that Python packaging does not yet have static metadata, and thus requires running setup.py to even acquire metadata about a package; to make sure it is actually the right project name and version, to find out what dependencies it has, etc. And projects like psycopg2 or scipy that have build dependencies are not careful to make it possible to get that metadata out of setup.py if you don't have all the build dependencies present on your system, because they primarily think of setup.py as intended for actual installation.

(This issue with build dependencies causing setup.py to fail was a major part of the motivation for introducing pyproject.toml.)

December 2013

Pip issue 1374 is another similar report. A distinction is drawn between building the package vs. running setup.py egg_info (not meaningful with modern pyproject.toml-based builds), which in principle is only supposed to figure out the metadata but in practical terms might invoke compilers for C code etc. anyway. The explanation is offered that "This is necessary to extract dependency information in order to download any dependencies."; everyone apparently overlooks that --no-deps has no effect on this behaviour.

June 2014

Pip issue 1884, "Avoid generating metadata in pip download --no-deps ..." (as it's currently titled), is opened. This seems to have become the canonical version of the bug report (others are closed as duplicates of it). The first reply offers a choice quote:

It's an unfortunate fact of the Python packaging ecosystem that anything related to packaging always involves arbitrary code execution (referring to setup.py).

(This was before wheels existed, of course, so that would have been even more true.)

There are several later duplicates of the issue - not all of them recognized and marked as such. For example, issue 2103 wasn't marked as a duplicate.

April 2015

The original proposal is made for the pip download command that will be added in Pip 8.0.0, as issue 2643.

January 2016

Pip v8.0.0 is released, deprecating pip install --download in favour of the newly added pip download.

November 2016

Pip v9.0.0 is released, adding a --platform flag for pip download. This has a bug whereby specifying a platform only works with wheels and errors out unless wheels are demanded. But in a way, this accidentally, partially, temporarily fixes the main issue.

February 2017

Pip issue 4289, 'Issue with "pip download --platform" semantics', is opened, reporting the (undocumented) restriction mentioned above.

May 2017

Some comments on issue 4289 propose that it shouldn't be necessary to run setup.py when using pip download --no-deps.

June 2017

A proposal is made to close issue 1884 because the pip install --download command syntax no longer exists. In fact, this syntax has only been deprecated, and it's pointed out that the problem still exists with pip download, so the issue isn't closed.

March 2018

Pip v10.0.0.b1 is released, fixing issue 4289 (and thereby revealing the main problem again for some users). The deprecated pip install --download is also removed, along with the completely nonsensical ability to specify pip download --editable.

November 2019

The title of issue 1884 is edited for the first time, to reflect the command syntax change from pip install --download to pip download. (This is almost four years after the new command was actually implemented.)

Pip issue 7325, 'Disallow execution of setup.py when "pip download --no-deps someproject"', is opened.

April 2020

Pip issue 7995, "pip download --no-deps --no-binary does some unwanted build steps" is opened. A choice quote:

Is there any case where it is useful to collect dependencies when --no-deps is specified?

No, pip is just not smart enough to not do it. The “problem” here is that pip download simply reuses code from pip install and just skips the actual install part.

Meanwhile, a workaround is offered on issue 1884, but it turns out not to work in current versions of Pip.

June 2020

Pip issue 8387, "Using pip download to fetch package sources seems to trigger building wheels for some packages.", is opened on the Pip bug tracker. This is another duplicate, but it notably reveals the fact that setup.py egg_info is run when --no-use-pep517 is passed to pip download. (Again, one may wonder why a download command is accepting options that control the build process; the obvious answer is that the UI explicitly provides for the expectation that the project will be built even though only a download is requested.)

July 2020

PEP 625, "Filename of a Source Distribution", is created. The proposal is supposed to standardize the filenames used for sdists, following existing common-but-not-guaranteed practices, such that Pip could reliably determine the project name and version from the filename. It will take over two years for this proposal to be accepted.

August 2020

Relevant commentary on issue 1884, with links to additional discussion on the Python Discourse forum:

The root problem is that source distribution metadata is not trustworthy, and it’s difficult to avoid building metadata sinnce pip needs to check for package integrity. The thing we really need to do before any of this can reasonably happen is to have standardisation on essential sdist metadata (namely package name and version) somehow. There has been efforts on this; feel free to contribute to them.

It's further noted that:

pip download foo-1.0 could find a file foo-1.0.tar.gz which contained a project called bar, version 2.0.

Pip has to get the package metadata (by building) to confirm that the filename matches the metadata.

(This is the problem that PEP 625 aims to solve.)

It is not really explained why such a hypothetical result should be a problem (or what to do with the already downloaded file) when --no-deps is specified. Instead:

Honestly, why not just get the PyPI URL and download it directly? You seem to be going to a lot of effort (and expecting others to as well) to basically download a file whose name you know.

...the reason not to just get the PyPI URL and download directly is that I want to get the same file that pip install would have chosen. And I don't know the filename ahead of time, the input is not necessarily a project name + version (pinned) but a general requirement specifier.

...

So I figure the only way to reliably download the correct release file (correct meaning "same one that pip would choose") is to use pip itself. Since there is no public API here, that means using the command line interface in a subprocess.

[no response]

September 2020

Pip issue 8850, "pip download --no-deps runs setup.py egg_info unnecessarily and fails", is opened. Notably, the name of the egg_info subcommand refers to the long-outdated "egg" format for packages; Pip still supports this (but at least it's deprecated... since 23.2... on Python 3.11 and up... and there may still be binary distribution formats that use the corresponding .egg-info metadata format). This report was dismissed as a bug in the package (which was trying to use undocumented internals of Pip in its setup.py logic).

October 2020

PEP 643, "Metadata for Package Source Distributions" describes version 2.2 of the core metadata (i.e. PKG-INFO) standard. According to this standard, the Name and Version of the project MUST NOT be marked as dynamic in an sdist, and consequently the values for these in the corresponding wheel MUST match.

December 2020

It is discovered and reported on issue 1884 that the demand for unnecessary metadata may have to do with the resolver. Pull requests 9305 and 9311 are created accordingly, but ultimately go nowhere. (This appears to explain the problem with the workaround offered in April.)

March 2021

Pip issue 9701, "pip download --no-deps tries to use PEP517 so badly it is not usable to download stuff", is opened. We get this choice quote:

The problem is that the only way to be sure that a sdist actually provides the version you specify is to build it. Yes, we could rely on the sdist filename, but it's not technically reliable, and we'd need to special-case stuff to make it work.

If and when build backends start including PEP 643 style metadata in sdists, to a level where it's worth the effort to check it before trying a build, we could use that to avoid the build step where the data is available statically. But I'm not even sure if any tools have implemented PEP 643 yet...

To be honest, though, if you want to just download a sdist from PyPI, pip probably isn't the tool you want. It's not that hard to query the PyPI JSON interface for the sdist url, and wget it. If I were doing this often enough that manually downloading via the web interface was insufficient, that's what I'd do.

(Editor's note: the link to PEP 643, "Metadata for Package Source Distributions", is not present in the original. To be fair, at this point the PEP had only been accepted a few months ago. Before that point, to my understanding, sdists were still expected to include a PKG-INFO file, but it's unclear what if anything one could actually do with that information.)

Meanwhile, back on issue 1884, in response to someone ruminating about forking Pip or starting work on a replacement:

I'm genuinely not being sarcastic or passive-aggressive here, I agree with this - I think it could only be healthy for the ecosystem to have alternatives to pip, which can look at alternative approaches without all of the backward compatibility constraints that pip works under.

...

By the way, I assume you're aware that if all you actually want is to download a file from PyPI, the JSON API is pretty straightforward to use. You can even do it as a shell one-liner, if you like:

wget $(curl https://pypi.org/pypi/pip/json| jq -r '.releases[.info.version][] | select(.packagetype=="sdist") | .url')

Making that into a Python script with options, etc, is pretty straightforward.

(But it is still, to the best of my knowledge, not offered officially as a tool by PyPA. It also doesn't invoke any resolver logic.)

There's also a reference back to issue 7995, and an objection that validation can't just be opt-in:

The biggest roadblock (aside from coming up with a rule that makes sense) is implementation; validation should only be skipped on very specific subcommand-option combinations, and it’s not trivial to pass all the needed context all the way down to where the validation is done.

This "pass all the needed context" language appears to refer to the December 2020 discovery. There's also a reference to issue 6607, "Build Logic Refactor", from June 2019, which proposes some cleanups for that context chain.

(Again, the complaints made on issue 1884 in August 2020 are not addressed.)

Finally the issue was (understandably) duped to 7995.

July 2021

Pip issues 7995 (originally specifically about pyproject.toml based builds) and 1884 (originally specifically about setup.py based builds) are consolidated.

October 2021

An interesting comment from issue 1884:

Apologies, I got confused between PEP 621 (Storing project metadata in pyproject.toml) and PEP 643 (Metadata for Package Source Distributions). PEP 621 is irrelevant here, as tools should not read metadata from pyproject.toml. Reading metadata from a sdist via PEP 643 would be useful, and is valid, though. While I guess it's tempting to assume that pip can read pyproject.toml and if it finds PEP 621 data, then use it, it would be wrong because there's no guarantee that the backend supports PEP 621, so there's no reason to believe that the metadata in the generated wheel/sdist would bear any relationship to the PEP 621 data.

August 2022

Issue 1884 is locked, with the comment:

...an easy way to restart discussion is if someone creates a PR with a suggested solution 🙂

September 2022

PEP 625 is accepted.

Setuptools issue 3593, "[FR] Implement PEP 625 - File Name of a Source Distribution", is opened.

April 2024

Setuptools issue 3593 is closed and then promptly reopened due to confusion over "trailing zeros" in version numbers (e.g. a version number like 1.0.0 being normalized to 1).

June 2024

Setuptools issue 3593 is properly closed again, fixed by PR 4434 - although there is additional discussion in the meantime of other backwards compatibility issues which may not have been addressed.

Python packaging: Why we can't have nice thingsPart 3: Premature Compilation

Meta