In the last article, we looked at a utility snippet that lets you find the code that's setting a particular placeholder. Our snippet had a drawback, though. It didn't look in any files that the snippet or plugin being examined pulled in with include or require. Usually, these would be class files, but not always. We'll fix that problem in this article by adding a function to search those files as well.
As a reminder, here's our snippet tag:
[[!FindPlaceholder? &placeholder=`SomePlaceholder`]]
Checking Included Files
In addition to checking the code of the snippet or plugin being processed, we need to look for our placeholder in any files that are pulled into the snippet or plugin being examined with include, include_once, require, or require_once. The first step is to find the file paths for any included files. We could search for each of those terms individually and parse the line they appear on, but we can do it faster by using a regular expression to get any and all of them at once and extract the file name in the process (more on this in a bit). We may get a few spurious matches, but when we try them, no file will be found, so no harm will be done.
We'll add another function: findIncludes(). This function uses a regular expression search (preg_match_all() ) to find the target strings and extract the filenames. If we find any, we'll check to make sure the file exists, pull in its content, and send that content to our existing checkContent() function.
Notice that in our two calls to checkIncludes(), we set the $type argument to 'file' so the report will say 'file' instead of 'snippet' or 'plugin.' We've also separated the snippet and plugin sections to keep the code from being too deeply nested.
Here's the modified code:
<?php
/* FindPlaceholder snippet */
/* Make this work in both MODX 2 and MODX 3 */
$prefix = $modx->getVersionData()['version'] >= 3
? 'MODX\Revolution\\'
: '';
$ph = $modx->getOption('placeholder', $scriptProperties);
function checkContent($content, $ph, $type, $name, &$output) {
$found = 0;
if (strpos($content, $ph) !== false) {
$found = 1;
$output .= "\n " . $type . ' ' . $name .
' contains placeholder ' . $ph . '';
}
return $found;
}
function checkIncludes($content, $ph, $type, $name, &$output) {
$pattern = <<<EOT
/(?:include|require)(?:_once)?\s+['"]([^'"]+)['"]/im
EOT;
$count = 0;
$success = preg_match_all($pattern, $content, $matches);
if (!empty($success )) {
foreach($matches[1] as $path) {
/* Skip includes that have a PHP variable in them */
if (strpos($path, '$') !== false) {
continue;
}
/* Make sure the file exists */
if (file_exists($path)) {
/* Make sure it's not a directory */
if (is_dir($path)) {
continue;
}
$c = file_get_contents($path);
$type = 'file';
if (!empty($c)) {
$count += checkContent($c, $ph, $type, $path, $output);
}
}
}
}
return $count;
}
$output = "\n\n<br> ### Searching Snippets";
if (!empty($ph)) {
$count = 0;
$type = 'snippet';
$snippets = $modx->getCollection($prefix . 'modSnippet');
foreach ($snippets as $snippet) {
$name = $snippet->get('name');
$content = $snippet->get('snippet');
$count += checkContent($content, $ph, $type, $name, $output);
$count += checkIncludes($content, $ph, 'File', $name, $output);
}
if ($count == 0) {
$output .= 'Placeholder not found in snippets';
}
} else {
return "\nError: Placeholder property is empty";
}
$output .= "\n\n<br>### Searching Plugins";
$count = 0;
$type = 'plugin';
$plugins = $modx->getCollection($prefix . 'modPlugin');
foreach ($plugins as $plugin) {
$name = $plugin->get('name');
$content = $plugin->get('plugincode');
$count += checkContent($content, $ph, $type, $name, $output);
$count += checkIncludes($content, $ph, 'File', $name, $output);
}
if ($count == 0) {
$output .= "\nPlaceholder not found in plugins";
}
return $output;
This code is the same as the code in the previous article except that we've separated the snippet and plugin sections, and added the checkIncludes() function and the two lines that call it (one for snippets and one for plugins). The comments in the code explain what we're doing.
The Pattern
We've used the heredoc syntax for the pattern used by preg_match_all(). The heredoc syntax is very handy when you have a string with a lot of single and double quotation marks and want them all preserved as written without having to escape many of them with a backslash, which makes the string much harder to create and to read.
Here is the pattern:
/(?:include|require)(?:_once)?\s+['"]([^'"]+)['"]/im
Thr pattern above is not surrounded by quotation marks as it would be in most PHP code because the heredoc syntax automatically makes it a string. The slashes at each end are delimiters. They show the regex engine where the pattern begins and ends. The i after the last slash tells the regex engine that we want a case-insensitive search. The m after the last slash tells the regex engine that we want a multi-line search.
The pattern looks for any one of our key terms: include or require ((?:include|require)), followed by 0 or one instances of _once ((?:_once)*), followed by one eor more spaces (\s+), followed by a quote character (['"]), followed by any series of non-quote characters ([^'"]), followed by a quote character (['"]).
The three sets of parentheses tell the engine what we want to capture (sort of). Actually the first two sets are prefixed with ?:, which makes them "non-capturing groups". The engine will look for them when matching the pattern, but won't store them. The third set of parentheses captures the series of characters between the quotes — the actual path to the file being included.
Using preg_match_all()
The preg_match_all() function finds all the strings in the text that match the pattern and puts them into a PHP array. We need to use preg_match_all() because there might be more than one included file to check. The preg_match() function would stop at the first match.
The first argument to preg_match_all() is the pattern, the second one is the text string to be searched and the third is the variable to put the array of matches into. If no matches are found, or there's an error, the return value will be empty. In the case of an error, the return value will be false.
You can't tell from our code, but the third argument ($matches in our case) is passed by reference. In a pass by reference, PHP passes the actual variable rather than a copy of it. Any changes to the variable's value will persist and will be available to the code that called the function. The preg_match_all() function modifies the $matches variable, and the changes are available in our code.
If you look at the preg_match_all() PHP manual page, you'll see the &$matches argument in the description. The ampersand tells us that that argument is passed by reference.
Unlike the preg_match() function preg_match_all() sets the third argument ($matches) to an array of arrays. The first member of the array ($matches[0]) is an array of all the full matches for the pattern. The second one ($matches[1] is an array containing the matches of the first capture group. Since we only have one capture group (for the file paths), that's all we need. If there were another capture group, we'd have to look at $matches[2] (or 3, etc.) as well.
Here's an example:
<?php
$s = "
include \"hello1.php\";
include_once 'hello2.php';
require 'hello3.php';
require_once 'hello4.php';
/* include hello5 */
";
$pattern = <<<EOT
/(?:include|require)(?:_once)?\s+['"]([^'"]+)['"]/im
EOT;
preg_match_all($pattern, $s, $matches);
echo print_r($matches, true);
exit;
The output of our example looks like this:
Array
(
[0] => Array
(
[0] => include "hello1.php"
[1] => include_once 'hello2.php'
[2] => require 'hello3.php'
[3] => require_once 'hello4.php'
)
[1] => Array
(
[0] => hello1.php
[1] => hello2.php
[2] => hello3.php
[3] => hello4.php
)
)
The first inner array (which is seldom used, but very handy for debugging the pattern) is the full pattern match for each find. The second one is the captured group for each match (just the file path). Notice that the fifth string (hello5) is not in the results because it doesn't match our pattern (even if it weren't in a comment, it still wouldn't match, because it has to quotation marks — note that our pattern will find thing in comments just fine as long as they match the pattern). In our checkIncludes() function we loop through the $matches[1] array, the list of file paths, processing each one with our checkContents() function.
You may have noticed that our pattern would not catch things like this:
include('someFile.txt');
It fails to match the pattern because of the parentheses. The "include" itself would work, but this form is discouraged. It's unnecessary because include and require are not functions. They are language constructs like echo and they do not need parentheses. After looking at thousands of lines of PHP code, I've never seen the parenthesized form. If you use it, or the person who wrote the code you're looking at did, you'd have to modify the pattern to account for them, like this:
/^(?:include|require)(?:_once)?\(?[^"']*['"]([^'"]+)['"]\)?/im
The pattern above adds \(? ahead of the capture group and \)? following it. The backslash escapes each parenthesis so the regex engine sees it as a literal parenthesis. The question mark tells the engine to match 0 or 1 of the previous character. Feel free to use this form if you think someone might have used parentheses for includes and requires.
Coming Up
You may have noticed that the code of our snippet is now quite ugly and inelegant. We're passing more arguments to the functions than we should, and the code to display the results is scattered all over the snippet. In other words, the model layer and the presentation layer are hopelessly mixed up. We also have duplicated code. We'll discuss this and see what we can do about it in my next article.
Bob Ray is the author of the MODX: The Official Guide and dozens of MODX Extras including QuickEmail, NewsPublisher, SiteCheck, GoRevo, Personalize, EZfaq, MyComponent and many more. His website is Bob’s Guides. It not only includes a plethora of MODX tutorials but there are some really great bread recipes there, as well.