Downloading using command line tools
COSMIC provides a simple interface for downloading data files. Downloading is a three stage process:
- generate an HTTP Basic Auth credential string
- make an authenticated request to obtain a download link
- make a request to retrieve the data file from the returned link
1. Generate an authentication string
When you make the first request you must supply the email address that
you used to register for COSMIC, along
with your COSMIC password. The email address and password must be
supplied in the Authorization
header of the request. We use
the HTTP Basic Auth protocol, which encodes the string using Base64
encoding, in order to avoid problems with non-word characters in
passwords.
To generate your HTTP Basic Auth string, concatenate the email and
password with a colon (:
) and base64 encode the resulting
string. For example, using standard Unix command line tools:
echo 'email@example.com:mycosmicpassword' | base64 ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=
Using that authentication string you can now make a request to obtain the download URL for the file. Note that it's important to use single quotes if you have any non-word characters in your password.
You can use the same authentication string for all of your downloads. You only need to re-generate the string if you change your COSMIC password.
2. Make an authenticated request to get the download URL
You can now request the URL for the download. Your request must include the authentication string that you just generated as a header:
Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=
Using the command line tool cURL, you could make the request like this:
curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=" \ https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v85/classification.csv
Alternatively, the path to the required file may be encoded and specified
as part of the URL or using the data
parameter:
curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=" \ https://cancer.sanger.ac.uk/cosmic/file_download?data=GRCh38%2Fcosmic%2Fv85%2Fclassification.csv
You can find the path for a file on our download page
If you have supplied valid COSMIC credentials, the server will return a small snippet of JSON containing a URL from which you can download your requested file:
{ "url" : "https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v85/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D" }
If your credentials were not valid, you will receive a response with
status code 401 Unauthorized
and a JSON snippet with an
error message:
{ "error" : "not authorised" }
3. Download the data file
You can now extract the URL from the JSON snippet and make a request to that URL to download the data file:
curl -o classification.csv 'https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v85/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D'
You do not need to provide the HTTP Basic Auth header for this request. The download URL is valid for one hour.
Note that the download server does not check that the
file that you request actually exists when you make your first request,
and will always generate a download link. If the file doesn't exist,
your second request will return status code 403 Forbidden
and an unhelpful snippet of XML:
<?xml version="1.0" encoding="UTF-8"?> <Error> <Code>NoSuchKey</Code> <BucketName>cosmic</BucketName> <RequestId>tx0000000000000026f493d-005ab8add9-5197c8c-default</RequestId> <HostId>5197c8c-default-default</HostId> </Error>
Download programmatically
Downloading files can be easily automated using a simple script. Here, for example, is a python script that retrieves a single CSV file from the download endpoint:
#!python3
import requests
email = "email@example.com"
password = "mycosmicpassword"
url = "https://cancer.sanger.ac.uk/cosmic/file_download/"
filepath = "GRCh38/cosmic/v85/classification.csv"
filename = "classification.csv"
# get the download URL
r = requests.get(url+filepath, auth=(email, password))
download_url = r.json()["url"]
# get the file itself
r = requests.get(download_url)
# write the file to disk
with open(filename, "wb") as f:
f.write(r.content)
Substitute your registered COSMIC email address and password and make sure that you have installed the requests library.
Using the file manifest endpoint you can also retrieve a list of all files for a specific COSMIC release, and use that to provide a list of download URLs. Here's an example script that does just that:
#!python3
import requests
email = "email@example.com"
password = "mycosmicpassword"
url = "https://cancer.sanger.ac.uk/cosmic/file_download/"
# get a list of all files in COSMIC v85
filelist = requests.get("https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v85")
# extract the download URLs from the JSON response
for filepath in filelist.json():
# extract the filename from the download file path
filename = filepath.rsplit("/", 1)[1]
# skip the Oracle database dump; it's around 50Gb
if "ORACLE_EXPORT" in filename:
print("skipping oracle export file")
continue
print(f"downloading {filepath} and saving as {filename}...", end="")
# get the URL for downloading the file
r = requests.get(url+filepath, auth=(email, password))
download_url = r.json()["url"]
# get the file itself
r = requests.get(download_url, stream=True)
# read the file and write it to disk chunk by chunk
with open(filename, "wb") as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
print(" done")
Here is a script for downloading a single file implemented in Perl:
#!perl
use strict;
use warnings;
use LWP::UserAgent;
use MIME::Base64;
use URI;
use JSON;
my $ua = LWP::UserAgent->new;
my $email = 'email@example.com';
my $password = 'mycosmicpassword';
my $url = 'https:/cancer.sanger.ac.uk/cosmic/file_download/';
my $filepath = 'GRCh38/cosmic/v85/classification.csv';
my $filename = 'classification.csv';
# build URL with the parameter specifying the file to be downloaded
my $uri = URI->new("$url$filepath");
# get the download URL
my $r = $ua->get( $uri, Authorization => 'Basic ' . encode_base64("$email:$password") );
# decode the JSON string in the response and extract the download URL
my $json = decode_json $r->content;
my $download_url = $json->{url};
# get the file itself and save it to disk
$r = $ua->get($download_url, ':content_file' => $filename);
Available download files
You can get a list of the available download files using a RESTful manifest endpoint:
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v85 [ "GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz", "GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar", "GRCh38/cosmic/v85/CosmicBreakpointsExport.tsv.gz", "GRCh38/cosmic/v85/CosmicCompleteCNA.tsv.gz", ...
The endpoint can also be interrogated to find the available genome releases, datasets and COSMIC releases:
curl https://cancer.sanger.ac.uk/cosmic/file_download [ "GRCh37", "GRCh38" ]
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38 [ "GRCh38/cell_lines", "GRCh38/cosmic" ]
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic [ "GRCh38/cosmic/v83", "GRCh38/cosmic/v84" "GRCh38/cosmic/v85" ]
If you simply want to retrieve files for the most recent release, you can
also substitute latest
for the release number:
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/latest [ "GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz", "GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar", ...
curl https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/latest [ "GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz", "GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar", ...
You can also use latest
when downloading files:
curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=" \ https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/latest/classification.csv { "url" : "https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v85/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D" }
Help Index
- COSMIC
- Tutorials
- Cancer Browser::Overview
- Selection
- Genes::Top genes
- Genes::Genes with Mutations
- Genes::Genes without Mutations
- Mutation Matrix
- Distribution
- Variants::Fusions
- Variants::Mutations
- Variants::Methylation
- Variants::CNV & Expression
- Samples::Mutant Samples
- Samples::Non-Mutant Samples
- Cancer Gene Census
- CNV Overview
- CNV ChromoView
- CNV & Expr Details
- CONAN
- Fusion::Mutations
- Fusion::Overview
- Fusion::Summary
- Gene::Analysis
- Gene View
- Genome Browser
- Overview
- Tissues
- Distribution
- Drug Resistance::Genes
- Drug Resistance::Mutations
- Variants::Mutations
- Variants::Fusions
- Variants::CNV & Expr
- Variants::Methylation
- References
- Drug Resistance::Mutation Details
- Gene::Mutation Details
- Methylation Details
- Mutation::Overview
- Ncv::Overview
- Rearrangement::Overview
- Sample::Overview
- Circos
- Overview
- Variants::Fusions
- Variants::Mutations
- Variants::Breakpoints
- Variants::Non-Coding Mutation
- Variants::CNV & Expression
- Variants::Methylation
- Mutation Spectrum
- Sequence Context
- Heatmap
- Non-Mutant Genes
- References
- Study::Overview
- Papers::Page
- Downloading Data
- Legacy Downloads
- Beacon