lineparse with SAS
Posted: 03 October 2012 08:40 PM   [ Ignore ]
Newbie
Rank
Total Posts:  2
Joined  2012-10-03

I’m trying to use the lineparse.sas macro from the WRDS website to extract information from 10K filings. Can anyone tell me how to specify the number of characters before and after a matched phrase? The position of the matched phrase in the extracted text seems random (though I think the code calls for 250 characters), and I would like to determine the number of characters both before and after the text.

I’ve also tried using the paraparse.sas macro, which doesn’t seem to be working correctly (ie, it will return text that does not include the search term).

Many thanks for your help!

Profile
 
 
Posted: 03 October 2012 11:47 PM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

I have not used these macros before. Have you tried using the ‘index’ function? (it returns the position of a string in a string). Look for example on p. 9 here: http://www2.sas.com/proceedings/forum2007/217-2007.pdf

Would some perl code be helpful?

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 04 October 2012 06:20 AM   [ Ignore ]   [ # 2 ]
Newbie
Rank
Total Posts:  2
Joined  2012-10-03

Hi, Joost - Thanks for your response. I will try the index code you referred to. And I am not yet real proficient with Perl, but if you have some on hand, I’d love to see it and figure it out.
Thanks again for your help!

Profile
 
 
Posted: 04 October 2012 10:27 AM   [ Ignore ]   [ # 3 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi,

You’re welcome smile

You can install Activestate perl on Windows.

The following code scans all the .txt files in a directory. It currently fetches the end-of-year, but that should be easy to change into capturing whatever you need. I am not sure which of the ‘use’(LWP, HTTP, HTML) it actually uses. Installing the packages with PPM is straightforward though.

When you run the script (in the command line), add “> output.csv” behind the filename (without the quotes). This will create a comma separated file with the file names and the output.

#!/usr/bin/perl
use LWP;
use 
HTTP::Request;

use 
HTML::StripScripts;

 
$dir "E:/edgar/10K_filings/";

sub fetchEOY {

    my $filing 
shift;
   
    
# CONFORMED PERIOD OF REPORT:    20000531

    
if($filing =~ m/CONFORMED PERIOD OF REPORT:\s*(.*)/) {
       
        
return $1;
    
}

    
return;
}

$i 
0;
opendir(DIR$dir);

foreach 
my $file (readdir(DIR))   {
   
    $i
++;
   
    if(
$file =~ m/txt/) {
   
        {
        local
( $/, *FH ) ;
        
openFH$dir $file ) or die "fatal error reading $file\n";
        
$filing = <FH>
        
}
       
        $fileShort 
substr $file0length($file)-4;
        
$eoy fetchEOY($filing);

        print 
$fileShort "," $eoy "\n";

    
}
   
}

closedir
(DIR); 

best regards, Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile