südrocket.de

January 3, 2022

Using awk for project hygiene

In a somewhat recent tweet Adam Gordon Bell predicted that awk might still be around in the year 3000 despite being already in use for 44 years:

Some software that was written in 2021 will still be in use in the year 3000.

What will it be?

Considering that I’m using awk today, and it was written 44 years ago, I guessing awk will still be in use then. But what else will make the cut? Anything made this year?

Here’s an example where I recently used awk for iOS development: I was working on a codebase that already went through at least one major change in its lifetime. Over the years, the codebase accumulated more and more translations. Some are still in use others are not. The naming convention for the translation keys also was revised along the way which contributed a lot to translation corpses being left behind. That’s why I wanted to answer the question: “Which keys are still in use now and which keys may be safely deleted from the project?” Let’s tackle the question one step at a time:

Searching for a single key

We can start by checking if one specific key is used in our codebase and then scale the method we developed for the general case. How do we define “is used”? Well, we’ll just say that whenever the key comes up in any of the source files (*.swift) the key is used. If the project uses Interface Builder we would also consider storyboard files (and potentially XIB files?). In my case considering *.swift files was enough so that’s what I’m going with here.

There are several command line tools available for you to find all occurrences of a given string within the project. My go-to tool in this case is always ripgrep but similar tools like git-grep or the silver searcher are also a great fit. If we want to know if the key ERROR_OFFLINE is used we’ll just execute the following command from our project root:

rg -t swift --case-sensitive -F "ERROR_OFFLINE" .

Let’s break it down:

-t swift tells ripgrep that we’re only interested in .swift files

--case-sensitive speaks for itself

-F tells ripgrep that our search term is a fixed string, meaning that it should not be treated as a regular expression

This yields the following output which tells us if and where our key is currently in use:

$ rg -t swift --case-sensitive -F "ERROR_OFFLINE"
iOSApplication/Sources/Model/Errors+Localization.swift
25:            return LocalizedString("ERROR_OFFLINE")

Getting a list of keys

Now that we know how to answer if a specific key is currently in use, we need to ask this question for all the keys. How do we get this list? This is where awk comes in. All the keys we might want to check are present in one of our Localizable.strings files. Assuming our reference is en.lproj/Localizable.strings we need a way to extract just the keys from the file without the translation. Each Localizable.strings file contains key-value pairs which map a key to a corresponding translation:

"ERROR_OFFLINE" = "Why are you offline?!";

Extracting the key in the first column is a prime use-case for awk. I’m by no means an expert in using awk. I’ll basically have to look up everything again when I use it. Here’s a great primer that I have bookmarked: Awk in 20 Minutes . Anyway, let’s get the keys out of our file:

$ awk '/^\"/ { print $1 }' < Assets/en.lproj/Localizable.strings

This is a bit less readable but it basically just says: Pipe our input file into awk, only look at lines that begin with a double quote " and print the first field. Awk per default separates a line into fields by looking at spaces. print $0 would’ve given us the full line, print $2 the equal sign and print $3 the translation plus the trailing semi-colon. So print $1 it is. And that’s already our list of keys that we need to process further.

Detect all unused keys

To answer the original question “Which keys are safe to delete?” we now combine the two previous mentioned methods. First, we’ll use our reference file to get a list of keys and then we’ll search the project using ripgrep for each of those keys. Combining the two methods could look like the following:

!#/bin/bash

awk '/^\"/ { print $1 }' < Assets/en.lproj/Localizable.strings  | \ # get a list of keys
while read -r KEY; do                                               # loop over each key
	rg -q -t swift --case-sensitive -F "$KEY" .                     #   browse the project for this key
	if [ $? -eq 1 ] ; then                                          #   if there's no match...
		printf "$KEY"                                               #     ... print out the key
	fi
done

Compared to our previous invocation above, we added the -q (quiet) option to our ripgrep command to suppress any output from ripgrep. In the following if statement we inspect the status code of the previous command. rg will exit with code 1 if it could not find any occurrence and 0 otherwise. Every key that did not come up in our search will be printed out. And that’s basically how we identify unsused keys within our project.

Caveats & Outlook

Of course the method as written above is not bulletproof for every project. For example, what if we fetch some keys dynamically from our server? The keys don’t necessarily need to appear in our source files to be important to or project. However, in my opinion a script like the above does no need to be 100% bullet proof to be useful. Sure, if you pipe the output of the above script to another command which just purges all those keys from all Localizable.strings files it better be bulletproof. But what we wanted for now is just to get a sense which of the existing translations are probably obsolete. And it might serve as a starting point for a more sophisticated script that accomodates the various intricacies of the project. The main takeaway is this: embrace the shell. Transforming output with the plethora of commands available to us is such a powerful tool. And awk is the swiss army knife among those tools.

P.S.: Drew DeVault has a great blog post about shell literacy.