Javascript regex - why doesn't it work in IE?
Having lost a lot of sleep, I still can't figure out:
The code below (simplifying it from the larger code that only shows the problem) Identifies Item1 and Item2 in FF, but not IE7. I dont know.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body>
<table><tr>
<td><img src=imgs/site/trash.jpg border=1></td><td><font style="">Item1</font></td>
<td><img src=imgs/site/trash.jpg border=1></td><td><font style="">Item2</font></td>
</tr></table>
<script type="text/javascript">
var _pattern =/trash.*?<font.*?>(.*)<\/font>/gim;
alert (_pattern);
var thtml = document.documentElement.innerHTML;
alert (thtml);
while ( _match =_pattern.exec(thtml)){
alert (_match[1]);
}
</script>
</body>
</html>
Notes: 1. I know there are better ways to get Item1 and Item2. this example is for displaying the regex problem I'm facing in the simplest way. 2. When I remove the table and / table tags, it works.
Thanks in advance
a source to share
The problem is that the multi-line JScripts implementation is wrong. It doesn't allow char. to match a newline character.
Use this regex instead: -
var _pattern = /trash[\s\S]*?<font[^>]*>([^<]*)<\/font>/gi;
This eliminates. in general, the note [\ s \ S] is equivalent but will match a newline.
The reason dropping the table makes a difference is because the IE.innerHTML implementation does not rely on the resulting original markup. Instead, markup is generated dynamically by exploring the DOM. When it sees a table element, it puts new rows in the output in different places than when the table is missing.
a source to share
As far as the real issue is, this is because javascript support for multi-line regex is not x-browser safe and IE in particular has issues. Removing the table declaration will probably force IE to format the remaining markup internally by one line (= success), where adding it will cause IE to add carriage returns, etc. (= Failure).
I know you said that you know there are better ways, but you haven't explained why you persist in doing so. By relying on regex and relying on IE's text-mode interpretation, the DOM can help you solve such problems. Do not do that.
a source to share
End td tags have a character that needs to be escaped: / slash. I don't know exactly why IE7 is getting disabled. Safari is fine as tested.
You might want to add an identifier to the table. Then just go to the childNodes of the table only. You would have spent a lot less HTML on a large page and probably saved memory too.
a source to share