This tutorial shows how to download 10-K filings from SEC's EDGAR, but can be easily changed to download other filings as well. This example uses the SAS dataset in the repository that holds all SEC filings.

Steps

The following steps are involved:
- select 10-K filings to download (in SAS)
- export list of urls of filings to a text file (in SAS)
- download and write each of the filings to disk (in Perl)

There are 200K+ 10-K (and equivalent) filings, which will take considerable harddisk space and time to download. The SEC prefers that bulk-download is done during 'quiet time', i.e., outside the regular trading hours.

It is highly recommended to only download these filings once. So, keep the SAS dataset that contains the 'counter'. This can be used to match back to cik.

Another issue is that it may not be possible to match all SEC filings with Compustat as the SEC uses CIK as the main identifier. In other words, do not perform manual work on 10-K's before making sure that any collected data can actually be used. Obviously, this is not much of an issue when using a script to collect data.

SAS code

 
/* select 10-K filings from edgar.filings which holds all SEC filings */
proc sql;

	create table a_10k as
	select 
		distinct cik, coname as edgarConame, filename as url, 
		date as filingdate10K, formtype
	from
		edgar.filings b
	where
		formtype IN ("10-K", "10-K/A", "10-K405", "10-K405/A", 
		"10-KSB", "10-KSB/A", "10-KT", "10-KT/A", "10KSB", 
		"10KSB/A", "10KSB40", "10KSB40/A", "10KT405", 
		"10KT405/A");

quit;

data a_10k ;
set a_10k;
downloadId = _N_;
run;

/* relevant variables to export: the id and the url */

proc sql;

	create table b_downloadList as
	select downloadId, url from a_10k;

quit;

PROC EXPORT DATA=  b_downloadList 
            OUTFILE= "C:\temp\downloadlist.txt" 
            DBMS=CSV REPLACE;
RUN;

Perl code

Copy/paste the following code to a text file and save it as "download.pl". Make sure the exported text file is in the same directory. Run the perl script from the command line (perl download.pl).

Thanks to David Veenman for beautifying the code.

 
#!/usr/bin/perl
use strict;
use warnings;
use LWP;

my $ua = LWP::UserAgent->new;

open LOG , ">download_log.txt" or die $!;
######## make sure the file with the ids/urls is in the 
######## same folder as the perl script
open DLIST, "downloadlist.txt" or die $!;
my @file = ;

foreach my $line (@file) {
    my ($nr, $get_file) = split /,/, $line;
    chomp $get_file;
    $get_file = "http://www.sec.gov/Archives/" . $get_file;
    if ($get_file =~ m/([0-9|-]+).txt/ ) {
        my $filename = $nr . ".txt";
        open OUT, ">$filename" or die $!;
        print "file $nr \n";
        my $response =$ua->get($get_file);
        if ($response->is_success) {
            print OUT $response->content;
            close OUT;
        } else {
            print LOG "Error in $filename - $nr \n" ;
        }
    }
}
#ignore the line below (inserted by Forum engine because it wants to 'close' a similar tag used to load the file)

All rights reserved. © 2010-2014 wrds.us [Copyright] [Privacy Statement] [Disclaimer] [About]