Setting up a private Python Package Repository on S3
Team Product operations opensources a tool to install Python packages with pip from your private repository.
If you write software in Python, you’ve definitely used pip – Python’s package manager – before. Pip – a recursive acronym for “pip installs packages” – allows you to install Python packages that are publicly available.
Pip defaults to searching in and downloading from PyPI, the official package repository managed by the Python Software Foundation. PyPI is great for publicly available open source packages, such as the one we’re open sourcing today, but it’s not the go-to choice for distributing private or closed source packages. For that purpose, you should use a privately hosted package repository. There are a number of tools available on this thing called the ‘internet’, such as Gemfury (commercial) or pypiserver (open source). Commercial or opensourced, all of these options require you to either run your own server or pay for someone else to do that for you.
At November Five, we use many of Amazon’s Web Services, such as EC2, RDS, ElastiCache, SQS, SNS, SES, Route 53, Lambda, DynamoDB, Cloudformation, Cloudfront, and of course, S3. S3 (Simple Storage Service) is an online service for file storage that allows you to store terabytes of data, but can also be used for static hosting. It has the double advantage of being both low in maintenance and cheap; you’d only be paying a couple of cents for the storage and bandwidth used by your site.
The tool we’re opensourcing today, s3pypi, allows you to more easily create your own package repository on S3.
There are a few prerequisites when setting up a Python package repository on S3:
- An AWS account. If you don’t have one already, go sign up.
- A domain or subdomain, e.g. pypi.example.com. You should be able to create or modify the DNS record for the (sub)domain you want to use.
- An SSL certificate for the domain you’re using.
In your AWS account, you need to setup an S3 bucket configured for website hosting, as well as a Cloudfront distribution for serving the content in your S3 bucket over a secure (HTTPS) connection, which is required by pip (by default).
We’ve created a Cloudformation template that configures these resources for you.
- Download the s3pypi template here.
- Upload your SSL certificate to your AWS account and keep note of the ServerCertificateId.
- Open the AWS console in your preferred AWS region, select Cloudformation, and click “Create stack”.
- Give the stack a meaningful name and enter the subdomain and ID of the server certificate.
- Skip through the next steps – unless you want to tag your resources – and create the stack.
- When finished, click the “Outputs” tab, and copy the value of the “CNAMERecordValue” output parameter.
- Create a CNAME (or alias if you’re using Route 53) record for your subdomain and point it to this Cloudfront distribution (CNAMERecordValue).
Install the s3pypi command line tool by running
$ (sudo) pip install -U s3pypi
in your console. If everything goes well, you should be able to run the s3pypi command line tool now:
$ s3pypi -v
Using s3pypi to publish a package
Now you’re ready to publish your first Python package to your private repository. Make sure you have your AWS credentials set up in your environment, and that you have permission to upload files to the S3 bucket that you created in the previous step.
In order to upload your package to your repository, cd to the root directory of your project, and run
$ s3pypi --bucket pypi.example.com
Install your packages using pip by pointing the
--extra-index-url to your subdomain:
$ pip install my-project --extra-index-url https://pypi.example.com/
Alternatively, you can configure the index URL in
[global] extra-index-url = https://pypi.example.com/
Access control for publishing files to your brand-new pypi repository is regulated entirely using your AWS Identity and Access Management. You could give an IAM user publish rights by assigning the managed IAM policy “PublishS3PyPIPackages” that is created by Cloudformation to their user profile.
Pip supports basic authentication for authenticating against a private pypi server. Unfortunately, S3 does not. This increases the risk of your private packages leaking. To reduce this risk, you can take some additional measures:
- s3pypi supports adding a
--secret SECRETparameter when publishing a package. This allows you to add a random string of your desired length to the url to obfuscate the location of the private packages. When you use this option, don’t forget to also update the extra-index-url in your config file to https://pypi.example.com/SECRET/.
- If you have a (set of) static IP(s), you can add IP whitelisting to the Web application firewall (WAF) of your cloudfront distribution, to only allow pip clients from this set of IP addresses to download packages. By using this template instead of the one above, the WAF is getting configured auto-magically together with the Cloudfront distribution and S3 bucket.
If these security measures are insufficient for your needs, you could take a look at the open source project s3auth.com, but you should also consider hosting your Python repository elsewhere.
All done? We hope private packages will never make you say ni again…