Using awk for project hygiene
In a somewhat recent tweet Adam Gordon Bell predicted that awk might still be around in the year 3000 despite being already in use for 44 years:
Some software that was written in 2021 will still be in use in the year 3000.
What will it be?
Considering that I’m using awk today, and it was written 44 years ago, I guessing awk will still be in use then. But what else will make the cut? Anything made this year?
Here’s an example where I recently used awk for iOS development: I was working on a codebase that already went through at least one major change in its lifetime. Over the years, the codebase accumulated more and more translations. Some are still in use others are not. The naming convention for the translation keys also was revised along the way which contributed a lot to translation corpses being left behind. That’s why I wanted to answer the question: “Which keys are still in use now and which keys may be safely deleted from the project?” Let’s tackle the question one step at a time:
Searching for a single key
We can start by checking if one specific key is used in our codebase and then
scale the method we developed for the general case. How do we define “is used”?
Well, we’ll just say that whenever the key comes up in any of the source files
(*.swift
) the key is used. If the project uses Interface Builder we would also
consider storyboard files (and potentially XIB files?). In my case considering
*.swift
files was enough so that’s what I’m going with here.
There are several command line tools available for you to find all occurrences
of a given string within the project. My go-to tool in this case is always
ripgrep
but similar tools like git-grep
or the silver
searcher
are also a great fit. If we want to know if the key
ERROR_OFFLINE
is used we’ll just execute the following command from our
project root:
rg -t swift --case-sensitive -F "ERROR_OFFLINE" .
Let’s break it down:
-t swift
tells ripgrep that we’re only interested in .swift
files
--case-sensitive
speaks for itself
-F
tells ripgrep that our search term is a fixed string, meaning that it
should not be treated as a regular expression
This yields the following output which tells us if and where our key is currently in use:
$ rg -t swift --case-sensitive -F "ERROR_OFFLINE"
iOSApplication/Sources/Model/Errors+Localization.swift
25: return LocalizedString("ERROR_OFFLINE")
Getting a list of keys
Now that we know how to answer if a specific key is currently in use, we need to
ask this question for all the keys. How do we get this list? This is where awk
comes in. All the keys we might want to check are present in one of our
Localizable.strings
files. Assuming our reference is
en.lproj/Localizable.strings
we need a way to extract just the keys from the
file without the translation. Each Localizable.strings
file contains key-value
pairs which map a key to a corresponding translation:
"ERROR_OFFLINE" = "Why are you offline?!";
Extracting the key in the first column is a prime use-case for awk. I’m by no means an expert in using awk. I’ll basically have to look up everything again when I use it. Here’s a great primer that I have bookmarked: Awk in 20 Minutes . Anyway, let’s get the keys out of our file:
$ awk '/^\"/ { print $1 }' < Assets/en.lproj/Localizable.strings
This is a bit less readable but it basically just says: Pipe our input file into
awk, only look at lines that begin with a double quote "
and print the first
field. Awk per default separates a line into fields by looking at spaces. print $0
would’ve given us the full line, print $2
the equal sign and print $3
the translation plus the trailing semi-colon. So print $1
it is. And that’s
already our list of keys that we need to process further.
Detect all unused keys
To answer the original question “Which keys are safe to delete?” we now combine the two previous mentioned methods. First, we’ll use our reference file to get a list of keys and then we’ll search the project using ripgrep for each of those keys. Combining the two methods could look like the following:
!#/bin/bash
awk '/^\"/ { print $1 }' < Assets/en.lproj/Localizable.strings | \ # get a list of keys
while read -r KEY; do # loop over each key
rg -q -t swift --case-sensitive -F "$KEY" . # browse the project for this key
if [ $? -eq 1 ] ; then # if there's no match...
printf "$KEY" # ... print out the key
fi
done
Compared to our previous invocation above, we added the -q
(quiet) option to
our ripgrep command to suppress any output from ripgrep. In the following if
statement we inspect the status code of the previous command. rg
will exit with
code 1
if it could not find any occurrence and 0
otherwise. Every key that
did not come up in our search will be printed out. And that’s basically how we
identify unsused keys within our project.
Caveats & Outlook
Of course the method as written above is not bulletproof for every project. For
example, what if we fetch some keys dynamically from our server? The keys don’t
necessarily need to appear in our source files to be important to or project.
However, in my opinion a script like the above does no need to be 100% bullet
proof to be useful. Sure, if you pipe the output of the above script to another
command which just purges all those keys from all Localizable.strings
files it
better be bulletproof. But what we wanted for now is just to get a sense which
of the existing translations are probably obsolete. And it might serve as a
starting point for a more sophisticated script that accomodates the various
intricacies of the project. The main takeaway is this: embrace the shell.
Transforming output with the plethora of commands available to us is such a
powerful tool. And awk is the swiss army knife among those tools.
P.S.: Drew DeVault has a great blog post about shell literacy.