Downloading using command line tools

COSMIC provides a simple interface for downloading data files. Downloading is a three stage process:

  1. generate an HTTP Basic Auth credential string
  2. make an authenticated request to obtain a download link
  3. make a request to retrieve the data file from the returned link

1. Generate an authentication string

When you make the first request you must supply the email address that you used to register for COSMIC, along with your COSMIC password. The email address and password must be supplied in the Authorization header of the request. We use the HTTP Basic Auth protocol, which encodes the string using Base64 encoding, in order to avoid problems with non-word characters in passwords.

To generate your HTTP Basic Auth string, concatenate the email and password with a colon (:) and base64 encode the resulting string. For example, using standard Unix command line tools:

echo 'email@example.com:mycosmicpassword' | base64
ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=

Using that authentication string you can now make a request to obtain the download URL for the file. Note that it's important to use single quotes if you have any non-word characters in your password.

You can use the same authentication string for all of your downloads. You only need to re-generate the string if you change your COSMIC password.

2. Make an authenticated request to get the download URL

You can now request the URL for the download. Your request must include the authentication string that you just generated as a header:

Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=

Using the command line tool cURL, you could make the request like this:

curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=" \
  https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v85/classification.csv

Alternatively, the path to the required file may be encoded and specified as part of the URL or using the data parameter:

curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=" \
  https://cancer.sanger.ac.uk/cosmic/file_download?data=GRCh38%2Fcosmic%2Fv85%2Fclassification.csv

You can find the path for a file on our download page

If you have supplied valid COSMIC credentials, the server will return a small snippet of JSON containing a URL from which you can download your requested file:

{
  "url" : "https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v85/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D"
} 

If your credentials were not valid, you will receive a response with status code 401 Unauthorized and a JSON snippet with an error message:

{
  "error" : "not authorised"
}

3. Download the data file

You can now extract the URL from the JSON snippet and make a request to that URL to download the data file:

curl -o classification.csv 'https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v85/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D' 

You do not need to provide the HTTP Basic Auth header for this request. The download URL is valid for one hour.

Note that the download server does not check that the file that you request actually exists when you make your first request, and will always generate a download link. If the file doesn't exist, your second request will return status code 403 Forbidden and an unhelpful snippet of XML:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <BucketName>cosmic</BucketName>
  <RequestId>tx0000000000000026f493d-005ab8add9-5197c8c-default</RequestId>
  <HostId>5197c8c-default-default</HostId>
</Error>

Download programmatically

Downloading files can be easily automated using a simple script. Here, for example, is a python script that retrieves a single CSV file from the download endpoint:

#!python3

import requests

email    = "email@example.com"
password = "mycosmicpassword"
url      = "https://cancer.sanger.ac.uk/cosmic/file_download/"
filepath = "GRCh38/cosmic/v85/classification.csv"
filename = "classification.csv"

# get the download URL
r = requests.get(url+filepath, auth=(email, password))
download_url = r.json()["url"]

# get the file itself
r = requests.get(download_url)

# write the file to disk
with open(filename, "wb") as f:
    f.write(r.content) 

Substitute your registered COSMIC email address and password and make sure that you have installed the requests library.

Using the file manifest endpoint you can also retrieve a list of all files for a specific COSMIC release, and use that to provide a list of download URLs. Here's an example script that does just that:

#!python3

import requests

email    = "email@example.com"
password = "mycosmicpassword"
url      = "https://cancer.sanger.ac.uk/cosmic/file_download/"

# get a list of all files in COSMIC v85
filelist = requests.get("https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/v85")

# extract the download URLs from the JSON response
for filepath in filelist.json():

    # extract the filename from the download file path
    filename = filepath.rsplit("/", 1)[1]

    # skip the Oracle database dump; it's around 50Gb
    if "ORACLE_EXPORT" in filename:
        print("skipping oracle export file")
        continue

    print(f"downloading {filepath} and saving as {filename}...", end="")

    # get the URL for downloading the file
    r = requests.get(url+filepath, auth=(email, password))
    download_url = r.json()["url"]

    # get the file itself
    r = requests.get(download_url, stream=True)

    # read the file and write it to disk chunk by chunk
    with open(filename, "wb") as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

    print(" done")

Here is a script for downloading a single file implemented in Perl:

#!perl

use strict;
use warnings;

use LWP::UserAgent;
use MIME::Base64;
use URI;
use JSON;

my $ua = LWP::UserAgent->new;

my $email    = 'email@example.com';
my $password = 'mycosmicpassword';
my $url      = 'https:/cancer.sanger.ac.uk/cosmic/file_download/';
my $filepath = 'GRCh38/cosmic/v85/classification.csv';
my $filename = 'classification.csv';

# build URL with the parameter specifying the file to be downloaded
my $uri = URI->new("$url$filepath");

# get the download URL
my $r = $ua->get( $uri, Authorization => 'Basic ' . encode_base64("$email:$password") );

# decode the JSON string in the response and extract the download URL
my $json = decode_json $r->content;
my $download_url = $json->{url};

# get the file itself and save it to disk
$r = $ua->get($download_url, ':content_file' => $filename); 

Available download files

You can get a list of the available download files using a RESTful manifest endpoint:

curl https://cancer.sanger.ac.uk/cell_lines/file_download/GRCh38/cosmic/v85
[
   "GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz",
   "GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar",
   "GRCh38/cosmic/v85/CosmicBreakpointsExport.tsv.gz",
   "GRCh38/cosmic/v85/CosmicCompleteCNA.tsv.gz",
...

The endpoint can also be interrogated to find the available genome releases, datasets and COSMIC releases:

curl https://cancer.sanger.ac.uk/cell_lines/file_download
[
   "GRCh37",
   "GRCh38"
]
curl https://cancer.sanger.ac.uk/cell_lines/file_download/GRCh38
[
   "GRCh38/cell_lines",
   "GRCh38/cosmic"
]
curl https://cancer.sanger.ac.uk/cell_lines/file_download/GRCh38/cosmic
[
   "GRCh38/cosmic/v83",
   "GRCh38/cosmic/v84"
   "GRCh38/cosmic/v85"
]

If you simply want to retrieve files for the most recent release, you can also substitute latest for the release number:

curl https://cancer.sanger.ac.uk/cell_lines/file_download/GRCh38/cosmic/latest
[
   "GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz",
   "GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar",
...
curl https://cancer.sanger.ac.uk/cell_lines/file_download/GRCh38/cosmic/latest
[
   "GRCh38/cosmic/v85/All_COSMIC_Genes.fasta.gz",
   "GRCh38/cosmic/v85/COSMIC_ORACLE_EXPORT.dmp.gz.tar",
...

You can also use latest when downloading files:

curl -H "Authorization: Basic ZW1haWxAZXhhbXBsZS5jb206bXljb3NtaWNwYXNzd29yZAo=" \
  https://cancer.sanger.ac.uk/cosmic/file_download/GRCh38/cosmic/latest/classification.csv
{
    "url" : "https://cog.sanger.ac.uk/cosmic/GRCh38/cosmic/v85/classification.csv?AWSAccessKeyId=KFGH85D9KLWKC34GSl88&Expires=1521726406&Signature=Jf834Ck0%8GSkwd87S7xkvqkdfUV8%3D"
}

Help Index