PHP Regex question
I have a series of urls in a web document, something like this:
<a href="somepage.php?x=some_document.htm">click here</a>
What I want to do is replace the bold chunk:
<a href = "somepage.php? x = some_document.htm"> click here </a>
.. with some kind of encrypted change (just say base64_encoding) .. something like this:
for each match, turn it to base64_encode (match)
Notes:
1. the phrase href = "somepage.php? X = will always precede the phrase.
2.a double quote ( " ) will always follow the phrase.
I'm not a regex guru, but I know some of you. Any easy way to do this?
UPDATE:
I solved it using a modified version of what Chris presented, here it is:
function encrypt_param( $in_matches ) {
return 'href="somepage.php?x=' . base64_encode( $in_matches[1] ) . '"';
}
$webdoc = preg_replace_callback( '/href="somepage.php\?x=([^"]+)"/',
'encrypt_param',
$webdoc );
a source to share
I think you are looking for something like this:
function doSomething($matches) {
return base64_encode($matches[1]);
}
preg_replace_callback('/href="somepage.php?x=([^"]+)"/', 'doSomething', $webdoc);
The answer preg_replace
works in a similar way. If you want to do something more complex the callback will let you do it
a source to share
I would consider using a PHP DOM parser . Anything less is a hack. (Not that hacks are always bad, just find out the difference between a simple regex and a DOM parser.) getElementsByTagName()
Will get your <a> tags, getAttribute()
get your href attributes, and setAttribute()
modify.
a source to share
It seems that you can combine a multi-step task that can end up creating additional problems in the long run. You basically would like to do three things:
- Find all anchor tags on a page
- Extract url in href attribute from these tags
- Extract specific variable in query string from this url
There are several ways to do this in PHP. Yes, one direct way is using regex, but it's less transparent. In this particular case, you are indeed setting a very small problem, reducing the scalability of your code for future applications.
My suggestion is an implementation of a lightweight DOM parser available from Source Forge called SimpleHTMLDom . By using this parser, you can write much cleaner code for the task you are performing.
foreach ($dom_object->find('a') as $anchor){
$url = $anchor->href;
$queryArray = array();
parse_str(parse_url($url, PHP_URL_QUERY), $queryArray);
$myVariable = $queryArr['x'];
}
And then, of course, $ myVariable will be the value you want with that regex.
a source to share
Regexes are fundamentally bad at parsing HTML (see Can you give some examples of why it is difficult to parse XML and HTML with regex? For what). You need an HTML parser. See Can you give an example of parsing HTML with your favorite parser? for examples using various parsers.
a source to share