4 of 7
4
SEC Filings on EDGAR SAS File
Posted: 20 January 2014 07:42 PM   [ Ignore ]   [ # 46 ]
Newbie
Rank
Total Posts:  9
Joined  2014-01-11

Hi Joost,

Sorry for my slow reply (and thank you for your fast reply!). There have been lots of things to juggle getting this project going so the coding is coming more slowly than I anticipated.

I don’t believe WordStat can automatically remove the html markup so I guess Perl is a better choice in that regard. I’m assuming Perl can remove html markup? WordStat is quite easy to do content analysis with, but I’ve been told that a major pro to using Perl is that you don’t actually have to download each individual 10K to analyze the content if you write the code properly. That, and of course it’s all done automatically with one code/program.

I haven’t yet Googled help with the pattern matching code for Perl. I have a dictionary of about 20 words that I’ll need to check for in every 10K so I imagine that once I figure out the code then it will be fairly straightforward (obviously a big upfront investment to learn this that will hopefully payoff later).

Thanks for your advice once again - this is helpful! (and motivating to keep me moving).

Mark

Profile
 
 
Posted: 22 January 2014 07:18 PM   [ Ignore ]   [ # 47 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Mark,

You’re welcome smile

If you need to match individual words then removing HTML probably doesn’t matter.

Drop another post (maybe in a new thread) if you would like to have my (unpolished) code as a starting point. Diving into Perl should be plenty fun smile Here’s a quick guide:
http://wso.williams.edu/wiki/images/d/d7/Perl-crash-course.pdf

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 09 February 2014 10:43 PM   [ Ignore ]   [ # 48 ]
Newbie
Rank
Total Posts:  16
Joined  2013-03-17

Hello Joost:

Thank you very much for the website and helpful suggestions! So far I have downloaded the 10-K files and use a perl code to extract texts from 10-K files.

Can I ask one more questions? I run the perl code in the Command Prompt and I can see the extracted texts on the Command Prompt. However, I am wondering is it possible that I can write these extracted texts along with firm name, CIK, SIC and file data into a Excel file so that I can do additional data cleaning? Thank you very much!!!

Best, StupidStudent.

Profile
 
 
Posted: 11 February 2014 08:42 PM   [ Ignore ]   [ # 49 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

A neat thing in perl is that you can easily write the output that goes to the screen into a text file.

Instead of:
perl yourfile.pl

Do:
perl yourfile.pl > output.csv

If you make sure the output is clean (comma separated, no ‘fluff’ or extra/debugging output), you can import the file straight into SAS.

Hope this helps,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 12 February 2014 09:58 PM   [ Ignore ]   [ # 50 ]
Newbie
Rank
Total Posts:  16
Joined  2013-03-17

Thank you very much Joost! This works! Now I can read the output in the CSV file. Can I ask one more question? Is there a good book or website that I can learn the Perl programming. Since I get the codes from some professor’s website and now I have some difficulties in reading the codes, I hope I can understand the codes and then I can write my own Perl codes.

For example, part of the code is:  ” ((^\s*?)((XX\s*(xx|xx)\s*(xx\s*xx\s*....”. where xxs represent some words. I have a hard time reading this so I am not sure whether I can change the code and apply it to my project…..Thank you very much!!!

Profile
 
 
Posted: 13 February 2014 07:38 AM   [ Ignore ]   [ # 51 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

You’re welcome smile

There are a lot of resources available online for perl; the hard-to-read code you are mentioning are regular expressions (so, google something like: perl regular expressions tutorial).

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 18 February 2014 05:37 PM   [ Ignore ]   [ # 52 ]
Newbie
Rank
Total Posts:  2
Joined  2014-02-18

Hi,

I’m about to embark upon the dreadful journey of going through thousands of 10-Ks as well. Just so I can plan my time accordingly for my research, how long will it approximately take if I were to download all 10-Ks from 2001-2006? I think I’ve seen about 10.000 10-Ks per year, with a file size of about 200KB each, so 2GB in total (unless my estimates are way off). Shouldn’t take more than a day or two right?

Profile
 
 
Posted: 18 February 2014 08:14 PM   [ Ignore ]   [ # 53 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi JW,

I have downloaded ‘all’ 10-K’s (including 10-KSB etc) over 1993-2012. That is 228,000 10-K’s, in total 275 GB. There is a wide variation in size, in recent years many exceed 1MB (there is a lot of HTML ‘fluff’).

Many of these are not needed if you would require a match with CRSP, or size constraints (like at least $x mln sales). If you have such restrictions, I would only download those firms’ 10-Ks that will be in your sample.

Good luck on your journey smile

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 10 April 2014 10:14 AM   [ Ignore ]   [ # 54 ]
Newbie
Rank
Total Posts:  2
Joined  2014-02-18

Dear Joost,

I have managed to download the 10-K filings. I am going to produce two output files, based on each 10-K that I will select based on certain criteria. The first will contain the full 10-K without any weird stuff like html tags and random codes. The second will be Item 7 (MD&A). The latter seems quite straightforward with Perl: find item 7, find the next item and print the results to a file. However, I’m struggling with the first. html tags are quite easy to remove, since they start with “<” and end with “>”, but I’ve noticed many 10-K filings have random jibberish included in the top/bottom parts of the file. For instance, in one file I encountered this piece:

YAEQ6’==Z5`_,=IJ=/BA0>J’\8!O]M5B*=5Z]=
MH’TN>)O]D_1J#8HBN\+3-72+28J-T]NN-8W3$`>CJ=4U/A%9?“QHH;B”/$0$
M`BH*\I`WK/)^O\65Y_#T,2+@H*#(,.+:#(Z/4=‘5U$9%8:.FUK64B!5R>KQL
MS&TTD50L;*[7Y<6_5Z7/NN7WO.?><[[N6K?/+<30W4OO^OJ;OTXKZQ].“J,2%
M)/HB;R9R0`OEB.RS>YFDP’MWC-?<T/C3/!1KH)^7/<17#\>-=!V]\03<9QY2
MI3’[HZ+EGWWA[.RE..Q!TM:‘3K75G&5;6S+36C@Z*ZWY6%H3V]]=T]W#Q9PC
M=VP/C?-F(R/**R(X>G[Y[MWE,>R.G1G)H6+#,)R[91B2J[./9A]1P`QJOX:D
MIZ/YU)[.#%VW’‘X]J’_V\AN’F5EB];&LG;/<4L2SZ!$D6HT_]*P)4@1@Q8K

I would like to filter out these parts, but can’t figure out how to achieve this. Can this be done with Perl?

Profile
 
 
Posted: 19 May 2014 07:41 PM   [ Ignore ]   [ # 55 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi JW,

Sorry for my late reply - It looks like I missed a few notifications of new posts.

10-K’s can have additional files attached in it (like pdfs, images, etc). These are usually at the end, so if you try to extract the MD&A it shouldn’t matter. Nevertheless, finding the ‘start’ and ‘end’ points is challenging and requires a lot of ‘fiddling’ (there are usually many variations of the start/endpoints). Also, you might ‘capture’ the table of contents entries (as opposed to the ‘real’ MD&A).

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 20 June 2014 07:20 AM   [ Ignore ]   [ # 56 ]
Newbie
Rank
Total Posts:  3
Joined  2014-06-16

Hello Jost

I am also new in SAS and PERL; I had read all the posts related to 10K fillings in SAS. Basically I run the SAS code that you suggested from this web page: http://www.wrds.us/index.php/tutorial/view/26 and then I am figuring out how to run the perl code.

I copy the code in a text file and change the extention to .pl (I also download ActiveState Perl on Windows) But I dont have a clue about how to run the perl script in SAS. I have tryed different ways:

Like:

Data Filings;
Filename myfth pipe ” c:\temp\batchdownload.pl” lrecl=32767;
RUN;

Data Filings;
infile myfth;
input;
run;

But it didnt work. I would really apreciatte if someone may help me.

Best

Andres

Profile
 
 
Posted: 20 June 2014 08:21 AM   [ Ignore ]   [ # 57 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Andres,

What you are trying to do makes sense - I am not sure if it can be done this way. In general terms, this is how I combine SAS and perl:

1. Create a dataset (for example with file names) in SAS and export it as a .txt or .csv file
2. Run the perl script (either manually outside SAS, or with SYS COMMAND from SAS (this is like executing a batch file)) - if you use SYS COMMAND, SAS needs to wait for it to finish
3. The perl script generates a .txt or .csv file
4. SAS imports the .txt or .csv file that perl has created

When the perl job takes a long time, I usually do step 2 manually. For example, I may print messages to the screen. For example, when scanning many files, I may output a counter every 10,000 files. When you execute perl scripts (or any other jobs) from SAS, you won’t see stuff output to screen.

I know of some researchers that like to use mySql as the means of communication between perl and SAS. (Perl would write the results in a mySql table, and SAS would connect with ODBC to that table. )

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 27 June 2014 04:34 AM   [ Ignore ]   [ # 58 ]
Newbie
Rank
Total Posts:  3
Joined  2014-06-16

Thnaks so much Joost, I will follow your indications!!

Profile
 
 
Posted: 30 June 2014 10:43 AM   [ Ignore ]   [ # 59 ]
Newbie
Rank
Total Posts:  3
Joined  2014-06-16

Hello Joost

I follow your indications and finally I downloaded the 10-K fillings in my computer. I did it running perl manually outside SAS, That script download the 10K fillings in txt files. Now I am wondering how to import those files into sas. Of course I know how to do it manually one by one, but I guess there is another way to do it, considering that there are thousands of files. Do you have any suggestion about a code that allow me to import this files (By the way, excuse me if I have mistakes when writing, I am not a native spoken english)

Best

Andres

Joost Impink - 20 June 2014 08:21 AM

hi Andres,

What you are trying to do makes sense - I am not sure if it can be done this way. In general terms, this is how I combine SAS and perl:

1. Create a dataset (for example with file names) in SAS and export it as a .txt or .csv file
2. Run the perl script (either manually outside SAS, or with SYS COMMAND from SAS (this is like executing a batch file)) - if you use SYS COMMAND, SAS needs to wait for it to finish
3. The perl script generates a .txt or .csv file
4. SAS imports the .txt or .csv file that perl has created

When the perl job takes a long time, I usually do step 2 manually. For example, I may print messages to the screen. For example, when scanning many files, I may output a counter every 10,000 files. When you execute perl scripts (or any other jobs) from SAS, you won’t see stuff output to screen.

I know of some researchers that like to use mySql as the means of communication between perl and SAS. (Perl would write the results in a mySql table, and SAS would connect with ODBC to that table. )

best regards,

Joost

Profile
 
 
Posted: 30 June 2014 10:51 AM   [ Ignore ]   [ # 60 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Andres,

After you downloaded the 10-K text files, run your perl program. Let it output ‘one line’ per file, for example like this:

id,var1,var2,var3,var4,var5,var6,var7,var8,var9,var10,var11,var12,var13,var14
1,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10,1,1,1,1,0,0,0,0,0,0,0,0,0,0
100,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1000,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10000,0,0,0,0,0,0,0,0,0,0,0,0,0,0
100000,1,0,0,0,1,1,0,0,0,0,0,0,0,0
100001,0,1,0,0,0,0,0,0,0,0,0,0,0,0

Then, import the file in SAS (use id to match it back to your sample).

Hope this helps,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
   
4 of 7
4