Crawl a sitemap.xml file with curl

Sitemap Icon
Icon by Freepik from Flaticon. Licensed under CC 3.0 BY.

Having dead URLs in your sitemap.xml file is a surefire way to tank your website’s search rankings. The curl command can be used to check every <loc> element defined in the file to find any broken links.

This command was adapted from Analyzing XML Sitemap Files with Bash which goes into much greater detail.

Top

Display Latest Tweets on AMP-compatible Static Web Sites

The AMP Project provides tools and guidelines for webmasters interested in making sites that load quickly and are highly optimized for mobile devices. AMP-compatible sites are not allowed to employ custom JavaScript but predefined AMP components are provided by Google.

The <amp-twitter> component may be used to embed a specific tweet, but there is currently no built-in way to embed the “latest” tweet (or series of tweets) from a specific user.

The pain is amplified for static web sites that cannot rely on a server to dynamically generate markup on the fly. Nevertheless, it is possible for static web sites to dynamically display the latest content from Twitter. Here are three ways to accomplish it.

Top

Beware AMP Templates Containing Links

I recently received a somewhat dramatically worded e-mail from Google:

Googlebot identified a significant increase in the number of URLs on https://blog.atj.me/ that return a 404 (not found) error. This can be a sign of an outage or misconfiguration, which would be a bad user experience. This will result in Google dropping those URLs from the search results. If these URLs don’t exist at all, no action is necessary.

The search for the culprit led to a surprising realization. Googlebot apparently does not ignore markup in AMP <template> elements.

Top

Generate Let's Encrypt SSL Certificates in Cloud9

Let’s Encrypt is a service offering free SSL certificates that can be generated automatically with the certbot utility. These certs are perfect for developing with HTTPS.

By default, AWS Cloud9 uses Amazon Linux AMI for the backing EC2 instance which is not supported by the certbot utility. Fortunately, the Cloud9 environment comes pre-loaded with Docker and Let’s Encrypt provides official Docker images for certbot.

Because Cloud9 does not expose port 80, a DNS challenge must be used to verify ownership of the domain. Generated certificates can be used by a helper process (like http-server) running on the backing EC2 instance and/or they may be copied onto another machine.

Top

Securely Backup Text Files with LastPass CLI

The easiest and most convenient way to securely backup files using the command line is with the AWS CLI (see aws s3 docs). S3 buckets support versioning and default encryption (which causes uploaded files to be encrypted at rest). If you’re already managing your passwords with LastPass, you have another option for securely backing up misfit text files.

LastPass users who typically interact with their vault via a browser plugin may be surprised to learn of an official command line utility called lpass that provides access to secure items. See the man page for subcommands and details.

Using lpass, a text file (or any content piped from stdin) can be stored as a LastPass secure note. As with S3, LastPass secure notes are versioned and can be selectively shared.

Top

Cloud9 with a Chance of Hugo

Hugo beautifully facilitates the creation of static web sites that are cheap to host and trivial to scale. Historically, the trade-off with static blogs has been a clunkier process for publishing new articles. The days of simply hitting “save” in a web browser to update a blog post from anywhere went away for a little while, but AWS Cloud9 is bringing them back.

Cloud9 is a web based IDE that is perfectly suited to managing a Hugo blog in the cloud. The static site source files can reside on any server accessible via SSH, but Cloud9 really shines when backed by an EC2 instance that it manages on your behalf. Persistent files are stored on an EBS volume and the EC2 instance can be automatically stopped after a configurable period of inactivity.

The IDE provided by Cloud9 is surprisingly powerful, but I’m not ready to make it my daily driver for general programming tasks. That said, it makes an excellent interface for managing specific projects in the cloud (for convenience, collaboration, etc).

Top

Redirect Custom Short URLs with Lambda

An AWS API Gateway proxy resource can be configured to respond to requests using a proxy integration to invoke a Lambda function.

From Set up a Proxy Integration with a Proxy Resource:

With this integration type, API Gateway applies a default mapping template to send the entire request to the Lambda function and transforms the output from the Lambda function to HTTP responses.

This makes it trivial to implement a redirection function for custom short URLs.

Top

Bundle Lambda Functions Using Webpack

Creating a Deployment Package (Node.js) describes how to create a compressed payload that includes the node_modules folder for Lambda functions that require npm modules.

An alternative to including the entire node_modules folder in the payload is to use webpack to bundle the Lambda function code and it’s dependencies into a single file.

Top

Ship Access Logs to CloudWatch

If an S3 bucket or CloudFront instance has access logging enabled, the logs will be delivered as plain text files to the designated target bucket.

Having access logs split into many files in an S3 bucket is not inherently useful but presumably Amazon takes this approach for the flexibility of processing options it affords. Downloading the logs to a local machine for analysis is a bit inelegant; since the logs are already “in the cloud” it is preferable to access them in a searchable manner via a web browser. CloudWatch is a product seemingly tailor made to solve this problem but unfortunately there is no turnkey solution to import access logs from S3. Luckily, it’s not too difficult to roll your own.

Top

Generate Yearly and Monthly Archive Pages with Hugo Sections

This blog, which is now generated with Hugo, uses the permalink structure /:year/:month/:slug (the slug is a url-safe version of the title). There are also “archive” pages, which list the articles posted in a given month or year. These pages have a permalink structure that compliments the post permalinks: /:year/:month and /:year respectively.

Back when this blog was generated with Jekyll, I used a plugin to generate the archive pages but there is no such mechanism in Hugo. It is possible to use taxonomies to accomplish this, but it involves defining a taxonomy for each year in the Hugo config, creating a corresponding layout and also adding redundant frontmatter to each post.

As an alternative, I created 2 new “sections” (top level subfolders in the content folder): archy and archm. archy contains one file per year and archm contains one file per month. These files can be generated automatically (see below).

Top