• +43 660 1453541
  • contact@germaniumhq.com

Navigating Legacy Code


Navigating Legacy Code

After we can build our code, it’s time to start addressing making changes. Now, not all code is created equal, and there are better ways to move around than simply opening every single file. Let’s see how.

There are different kind of files in a project tree. From js, or java, or python files, XML/HTML/whatever that define the UI, to properties files that define internationalization. Each of these require a different approach, especially in projects that are cross language.

Regardless, I found, even after ten years, that grep, and find still represent probably one of the best way to find out "fuzzy" matches.

Why? Because my process of finding things goes something like this:

find . -name \*.java | xargs grep "some pattern"

In case you have no idea, basically it’s finding all java files, and tries to match some pattern in them. As this is potentially filled with false positives, I keep adding excludes to lines that don’t match my query:

find . -name \*.java | xargs grep "some pattern" | grep -v "/target/"

To have only the files matched, I’m going to keep only the first part, then remove the duplicates, and sort them by name:

find . -name \*.java | xargs grep "some pattern" | grep -v "/target/" | cut -f1 -d: | sort -u

This continues until I found all my matches. Then the next step of analysis becomes simply wrapping the find into arguments for the editor, in my case vim; -p is to open the files in tabs:

vim -p $(find . -name \*.java | xargs grep "some pattern" | grep -v "/target/" | cut -f1 -d: | sort -u)

While this potentially looks like magic, it’s a super simplistic process. It starts with writing a simplistic find that finds a lot of things. Then I simply press arrow <UP> so my terminal, writes the full command again, put a | continue the filtering, and rerun. At the end the whole command looks like some super genius thing, but is actually simplistic in nature.

This makes it easy to search unstructured code, across multiple repositories, even in different languages possible. For example if I want to find a function that’s called either get_name, or getName, or maybe GetName, or if Cobol, even something like get-name, I’m searching something like:

grep -R -i get * | grep -i name

This finds all the lines that have "get" matches, case insensitive, then filters further only the lines that contain "name". This potentially matches a bunch of garbage as well (i.e. getUserName matches as well), but as we already discussed it’s trivial to add garbage filtering with grep -v -i getUserName.