Purpose

One thing people may worry about SEO when they are writing posts is how long my posts should be. The purpose of this post is to calculate how long are the posts of your blog (in number of words). A GNU/Linux o Mac OS X machine with access to the posts folder will be required to follow this tutorial.

Post word count script

To list all your blog and sort them by the number of words descending you just need to run the following bash script:

#!/usr/bin/env bash

# get posts in an array called posts
posts=`ls -p | grep -v /`;

# print length of posts array
echo -e "\nListing posts sorted by word count (${#files[@]} posts): \n";

# loop posts array
for post in $posts;
  # word counting content of each file
  do cat $post | wc -w | while read n ;
    do printf '%4s words => %s\n' $n $post;
  done;
done |
# numeric sort (-n numeric) descending (-r reverse)
sort -n -r;

One-liner version:

$ posts=`ls -p | grep -v /`;echo -e "\nListing posts sorted by word count (${#posts[@]} posts): \n";for post in $posts; do cat $post | wc -w | while read n ; do printf '%4s words => %s\n' $n $post; done; done; sort -n -r;

Bonus track: Keyword density checker script

Another thing to bear in mind in SEO is the keyword density. You can use the following script to calculate the percentage of times that a specific keyword appears in one post and the density.

1) With local access to the post file:

$ KEYWORD='MY_KEYWORD'
$ TOTAL=`cat MY_POST_FILE | wc -w`
$ FOUND=`cat MY_POST_FILE | grep -io '\<KEYWORD\>' | wc -w`
$ printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s.\nnDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))

# Example:
KEYWORD='post'
TOTAL=`cat 2018-05-07-how-long-are-my-posts.markdown | wc -w`
FOUND=`cat 2018-05-07-how-long-are-my-posts.markdown | grep -io '\<'$KEYWORD'\>' | wc -w`
printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s\nDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))
>
Total word count: 489
Keyword (post) appears 16 time/s
Density: 3%

2) Accessing an external post URL:

$ apt-get install -y lynx
$ TOTAL=`curl MY_POST_URL | lynx -stdin -dump | wc -w`
$ FOUND=`curl MY_POST_URL | lynx -stdin -dump | grep -io '\<KEYWORD\>' | wc -w`
$ printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s.\nnDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))

# Example:
apt-get install -y lynx
KEYWORD='docker'
TOTAL=`curl -s "https://devopsheaven.com/docker/dockerignore/2018/04/25/using-dockerignore.html" | lynx -stdin -dump | wc -w`
FOUND=`curl -s "https://devopsheaven.com/docker/dockerignore/2018/04/25/using-dockerignore.html" | lynx -stdin -dump | grep -io '\<'$KEYWORD'\>' | wc -w`
printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s\nDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))
>
Total word count: 499
Keyword (post) appears 14 time/s
Density: 2%

We are using lynx in order to word count once the HTML content is rendered (preventing to count HTML meta tags).