1 of 7
1
SEC Filings on EDGAR SAS File
Posted: 03 March 2012 12:47 PM   [ Ignore ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

Hi,
I’m a newbie to SAS. I happen to bump into this website. My thesis related to downloading 10-k Filings from EDGAR.
So I’m quite happy to get the SAS code from this website. However, although I already tested the code with the SAS files (datasets from from 1993-2010) from http://www.wrds.us/index.php/repository/view/25 I don,‘t know how you guys can download the files from the SEC website and put the datasets in SAS files like this. Anyone helps me this step? Thanks a bunch.

Profile
 
 
Posted: 03 March 2012 03:13 PM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi SamLe,

First, download the SEC master index files (url provided in the url in your post).
The attached code will import these files into SAS.

best regards,

Joost

File Attachments
edgar_quarterly_files_-_wrds.us_upload.txt  (File Size: 2KB - Downloads: 2023)
 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 06 March 2012 07:20 PM   [ Ignore ]   [ # 2 ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

Thank you very much, Joost ^^.

Profile
 
 
Posted: 07 March 2012 09:42 PM   [ Ignore ]   [ # 3 ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

Hi joost,
Just want to ask about your database.
I see that the SEC filings from 2000-2012 text files seems impossible to read.
For example, GE 10-K text file

http://www.sec.gov/Archives/edgar/data/40545/000004054512000016/0000040545-12-000016.txt 

has errors.
So, I just want to ask if you SEC filings database is good?

Profile
 
 
Posted: 07 March 2012 09:46 PM   [ Ignore ]   [ # 4 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi SamLe,

What kind of errors does this GE filing have? (Looks like a valid HTML filing to me..)

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 08 March 2012 12:35 AM   [ Ignore ]   [ # 5 ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

I just want to ask do you have any code HTML Parser to standardize the files?

Profile
 
 
Posted: 08 March 2012 08:26 AM   [ Ignore ]   [ # 6 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

Try this:

use HTML::Entities

# read filing

local( $/, *FH ) ;
openFH$filename ) or die "fatal error reading $filename\n";
$filing_raw = <FH>;

# remove all HTML tags
$filing_raw =~ s/<[^>]*>//sg;

# translate HTML entities (&nbsp; &amp; etc.) to regular text 
$filing decode_entities($filing); 

best regards,

Joost

 

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 09 March 2012 04:12 PM   [ Ignore ]   [ # 7 ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

Thank you very much, Joost

Profile
 
 
Posted: 13 March 2012 09:54 PM   [ Ignore ]   [ # 8 ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

Hi Joost,
Can I ask something? Where can I get HTML -Format package and HTML- Tree? I try to run perl code but it says that I do not have HTML- Format package.
I got these errors:

Can’t locate HTML/Formatter.pm in @INC (@INC contains: /Library/Perl/Updates/5.10.0 /System/Library/Perl/5.10.0/darwin-thread-multi-2level /System/Library/Perl/5.10.0 /Library/Perl/5.10.0/darwin-thread-multi-2level /Library/Perl/5.10.0 /Network/Library/Perl/5.10.0/darwin-thread-multi-2level /Network/Library/Perl/5.10.0 /Network/Library/Perl /System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level /System/Library/Perl/Extras/5.10.0 .)

Profile
 
 
Posted: 14 March 2012 08:41 AM   [ Ignore ]   [ # 9 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi SamLe,

Are you using ActiveState Perl on Windows? If so, there is a package manager (click Windows button, “all programs”, “ActivePerl 5.xx”, “Perl Package Manager”). You can search (for HTML) and install packages you need.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 09 April 2012 11:33 AM   [ Ignore ]   [ # 10 ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

Hi Joost,
I’m moving on to the extracting part
I have to extract the tax footnote from the 10-K form. So, I’m using the beginning string “Note 9. Provision for income taxes” and ending string but it seems like the ending string is different for each company. For example, for GE, note 10 is GES, while for other company, note 10 is Information .... I mean the 10-K structure is not unique. No global structure.
So, I just want to ask if you have any way to extract? or I have to adjust the end string manually? Thanks.

Profile
 
 
Posted: 09 April 2012 11:53 AM   [ Ignore ]   [ # 11 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Samle,

Yes, this is tricky. Basically, you don’t want it to be too short (early cutoff), but also not too long (including the next notes).
One thing that may work (not sure of course), is to use pattern matching to find the number of the tax note. ~note (\d*).? provision for income taxes ~i (or something), will capute the number (in your example the number 9). (I suppose it is not always 9; if it is always 9, you won’t need such a pattern). Then take all text until you run into the next note (note 10 in your example). (this will not work in in the tax footnote there is a reference to the next note, which could be the case if it is related)

Another solution may be to collect up till 1000 characters after the last occurence of ‘tax’; meaning, as long as ‘tax’ is used every now and then you are assuming you are still in a tax footnote. This will not work if the tax footnote does not use ‘tax’ ‘too long’, or when next footnotes also use ‘tax’. Finding the right cutoff with some experimenting..

hope this helps,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 14 April 2012 09:54 AM   [ Ignore ]   [ # 12 ]
Newbie
Rank
Total Posts:  10
Joined  2012-03-03

thank you, Joost.
It seems that I need to manually adjust my code.
Btw, can I ask some more thing.
When I download by Perl, is there any Perl script to differentiate between html files and plain text files? Because I want to separate between html files and text files.

Profile
 
 
Posted: 14 April 2012 12:26 PM   [ Ignore ]   [ # 13 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Samle,

Compare several of HTML filings with txt filings and see if there are some tags in one or to other.
I would expect that only HTML filings have a <BODY tag and/or a <HTML tag.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 29 December 2012 06:57 AM   [ Ignore ]   [ # 14 ]
Newbie
Rank
Total Posts:  6
Joined  2012-12-29

Hi,

first of all thank you Joost for the great help in extracting SEC filings so far. I got to the point where I run the Perl code from http://www.wrds.us/index.php/tutorial/view/26

However, executing the code gives me following error message:

No such file or directory at download.pl line 45, <dlthisline 227324 

I was trying to understand the message but couldnt find any explanation. As far as I understand the problem is somewhere here:

$filename $write_dir "/" $CIK ".txt"

Probably it’s sth easy to see but I am struggling with this already all day long. Could anyone lead me to the right path? TIA

smilebey

PS: great forum that I found here!!!
PSS: Wish you all a happy new year smile

Profile
 
 
Posted: 29 December 2012 02:08 PM   [ Ignore ]   [ # 15 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Smilebey,

I am sorry to hear about your struggles in the holiday smile

I have attached a very similar version to this post (rename .txt to .pl before running).

Hope this one will work for you,

best regards,

Joost

File Attachments
batchdownload.txt  (File Size: 2KB - Downloads: 771)
 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
   
1 of 7
1