how to count sentence with perl code?
Posted: 07 July 2014 07:12 AM   [ Ignore ]
Newbie
Rank
Total Posts:  16
Joined  2014-04-04

Dear All,

suppose i have the following data:

data = “Congressman Mike Simpson has been on a crusade to allow more arsenic in drinking water. For more than a decade, the eight-term Idaho Republican has fought battle after battle to permit higher levels of the toxic chemical in small-town water supplies. “;

My first question is how do i split data into a number of sentences?

Have a great day,
Clark

My ultimate goal is measure readability. I guess the first step is to split data into sentences. Thanks for the generous help, Joost and everyone here.

Profile
 
 
Posted: 07 July 2014 08:05 AM   [ Ignore ]   [ # 1 ]
Sr. Member
RankRankRankRank
Total Posts:  169
Joined  2011-09-20

It seems this topic is covered here.

http://www.addedbytes.com/blog/code/gunning-fog-function/

 Signature 

Zenghui
A humble student of business

Profile
 
 
Posted: 07 July 2014 08:27 AM   [ Ignore ]   [ # 2 ]
Newbie
Rank
Total Posts:  16
Joined  2014-04-04

also I see this website

https://readability-score.com/

let me try to understand them now. Thanks!

Profile
 
 
Posted: 13 July 2014 12:09 PM   [ Ignore ]   [ # 3 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

hi Clark,

This perl package (Lingua-En-Fathom) may be helpful: http://search.cpan.org/dist/Lingua-EN-Fathom/lib/Lingua/EN/Fathom.pm

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile
 
 
Posted: 14 July 2014 05:31 PM   [ Ignore ]   [ # 4 ]
Newbie
Rank
Total Posts:  16
Joined  2014-04-04

Thanks a lot, Joost!

You are the best !!!

smile

Profile
 
 
Posted: 14 July 2014 06:06 PM   [ Ignore ]   [ # 5 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  901
Joined  2011-09-19

You’re welcome! smile

By the way, the Fog index is very sensitive to cleaning of periods. It uses word count divided by the number of periods in the text in its formula. That includes periods like ‘Mr.’ ‘U.S.’, ‘5.25%’, $1.20, etc. You probably want to clean up your text before ‘feeding it’ to any algorithm.

best regards,

Joost

 Signature 

To reply/post new questions: Please use the group WRDS/SAS on Google Groups! http://groups.google.com/d/forum/wrdssas

Profile