How can I select the tag name and attributes AND the values ​​of those attributes with ONE regex?

I have the following regex from this post ( Regex to extract tag attributes ).

(\S+)=["\']?((?:.(?!["\']?\s+(?:\S+)=|[>"\']))+.)["\']?

      

I created the following PHP code and it works well. I am getting [ id = 'gridview1' and 'id' and 'gridview1' ] from preg_match_all () function.

$regexp = '/(\S+)=["\']?((?:.(?!["\']?\s+(?:\S+)=|[>"\']))+.)["\']?/';
$text = '<asp:gridview id=\'gridview1\' />';

$matches = null;
preg_match_all($regexp, $text, $matches);

print_r($matches);

      

How do I change the regex to also return 'asp' and 'gridview'? (or 'Foo' and 'bAR' when I use:

<Foo: bAR />

0


a source to share


2 answers


You shouldn't use regular expressions to parse HTML



+1


a source


([a-zA-Z]+)\:([a-zA-Z]+)

will work for something like Foo: bar

<.*?([a-zA-Z])+.*?\:.*?([a-zA-Z])+.*?\/>

will work for <Foo: BArrr / ">



Things can be optimized depending on your requirements and whether you know a certain type of formatting is applied.

+1


a source







All Articles