DirWalker—a Class to Traverse Directories

An intro to using the proven DirWalker class to select, filter and process files across directories.

By Bob Ray
March 15, 2022
DirWalker—a Class to Traverse Directories

This is the first of several articles on using the DirWalker class to process files.

Sometimes you need to do something with all the files in a particular directory (and sometimes its descendants) in PHP. You can roll your own code using a recursive call to scandir() or the RecursiveDirectoryIterator class, combined with isDir(), but you often need to then filter the directories and include or exclude files by filename or extension. You also have to add different handling for directories and files.

I have to do this kind of thing often enough that I created an Extra called DirWalker that provides a class to handle the problem. DirWalker is a class that recursively traverses directories. It’s an adaptation of some code posted by boen dot robot on the php.net scandir() manual page.

DirWalker is an essential part of any code that pulls information out of the MODX codebase. It has many other uses as well. Any time you want to process some or all files in a directory, and optionally, its descendants, DirWalker will make your life much easier.

DirWalker is used extensively in the MyComponent extra, in the Orphans Extra, and in the generation of the MODX object Quick Reference page at Bob’s Guides.

Overview

It’s most efficient if you process each file as you find it, but I find that it’s often easier and almost as fast to create an associative array containing the path to each file and its filename, then process them after the fact. You can easily process the files as they are found with DirWalker by extending the class and overriding the processFile() method.

DirWalker walks through all of the files in the specified directory (and its descendants if the recursive argument is true). It creates an associative array containing key/value pairs where the key is the full path to the file (including the filename) and the value is just the filename. Various filters can be used to exclude directories, and to exclude or include files. You can even use a regex search for including or excluding files based on their filenames.

The resetFiles() method empties the file list (in case you want to call dirWalk() more than once), and the getFiles() method gets the array of files from outside the class.

There is more detailed information about DirWalker at Bob’s Guides. See the note below for information on downloading the code.

The Class Code

The class code got a little too long to post here. You can see it at GitHub.

The DirWalker class is available through Package Manager. You can install it on your site and simply include the file in a Snippet or Plugin with this line (it also works in MODX Revolution 3):

include MODX_CORE_PATH . 'components/dirwalker/model/dirwalker/dirwalker.class.php';

DirWalker will run fine outside of MODX (though you’ll need the full path to it in the 'include' statement). DirWalker is remarkably fast, considering what it’s doing, but if your process takes long enough to bump up against the default PHP 30 second time limit, you’ll want to run it from the command line.

Usage

Here’s a typical example that recursively traverses the MODX2 core directory and all its descendants. It collects all class files that have .class in their filenames. It skips the cache, and packages directories. It excludes minimized and aggregated files and skips Git files and directories. The code assumes that you have installed the DirWalker package in Package Manager.

include MODX_CORE_PATH . 'components/dirwalker/model/dirwalker/dirwalker.class.php';
$searchStart = MODX_CORE_PATH;
$output = '';
/* instantiate the class */
$dw = new DirWalker();
/* Set files and directories to include and exclude */
$dw->setIncludes('.class');
$dw->setExcludes('-all,-min,.git');
$dw->setExcludeDirs('cache,.git,packages');
/* Perform the search */
$dw->dirWalk($searchStart, true);
/* Get the Results */
$fileArray = $dw->getFiles();

/* Process the file array */
foreach($fileArray as $fullPath => $fileName) {
    /* process each file here */
    $output .= "\nFILE: ". $fileName . ' -- PATH: ' . $fullPath;
}
/* Echo the output if running from the command line,
   otherwise, return it */

if (php_sapi_name() == 'cli') {
    echo $output;
} else {
    return nl2br($output);
}

After the code above runs, all .class files in or below the MODX core directory will be in the $fileArray associative array, and you can process them however you like. If you are producing a report, you’ll probably want to sort them in some way, and you’ll probably want to remove the first part of the full path for each file if the path is being reported.

The example above is not a good one for MODX3, because in that version, the MODX class files no longer have the word “class” in their names. The example will run, but it won’t find any class files.

It’s tempting to reverse the key and value members of the processFile() method so the filename is the key, which would make searching and sorting easier, but don’t do it. If you reverse them, you’ll risk missing files with duplicate filenames in different directories. Every file with a name that’s already in the array will overwrite the existing one in the array.

More on Using DirWalker

There is more information on DirWalker at Bob’s Guides. It is also available as a MODX Extra that you can install with Package Manager, or it can be downloaded from the MODX Repository or from GitHub. The class itself does not require MODX.

In the next few articles, we’ll look at some more ways to extend and use the DirWalker class.


Bob Ray is the author of the MODX: The Official Guide and dozens of MODX Extras including QuickEmail, NewsPublisher, SiteCheck, GoRevo, Personalize, EZfaq, MyComponent and many more. His website is Bob’s Guides. It not only includes a plethora of MODX tutorials but there are some really great bread recipes there, as well.