RegEx - how to parse html page for template (in JavaScript)

I need to parse an html page for patern. I am assuming the matches are being loaded into an array. And then I need to output the contents of the array.

<script language="JavaScript" type="text/javascript">
var adBookmarkletData=[
'<html><head><title>MYSA Yahoo! APT Debugger</title></head><body><center><div style=\"background:#ccc;color:#000;width:350px;text-align:left;padding:15px;border:2px #000;\">','<b>MYSA Yahoo! APT Debugger:</b><br /><hr />',
'<b>URL:</b> '+document.location.href+'<br />',
'<b>Pub ID:</b> '+window.yld_mgr.pub_id+'<br />',
'<b>Site Name:</b> '+window.yld_mgr.site_name+'<br />',
'<b>Content Topic ID List:</b> '+window.yld_mgr.content_topic_id_list+'<br />',
'<b>Site Section Name List:</b> '+window.yld_mgr.site_section_name_list+'<br />'
];
for(i in window.yld_mgr.slots){
    adBookmarkletData.push('<b>Ad:</b> ('+i+')<b>Category:</b>('+window.yld_mgr.slots[i].cstm_content_cat_list+')<br />');
    };
//Here my problem starts
    var myRegExp = new RegExp("place_ad_here\('(.*?)'\)");
//Here my Problem ends
adBookmarkletData.push(myRegExp.exec(document.innerHTML));

adBookmarkletData.push('</div></center></body></html>');
function createAptDebugger(){
   for (i in adBookmarkletData){
    document.write(adBookmarkletData[i]);
    }
};
void(createAptDebugger());
</script>

      

RegEx pattern works in online tester against sample code. But the results are zero here. I don't understand how to route the RegEx to the html page and then output it from the array.

For clarity, the html will have tags like this in the body.

<script type="text/javascript">yld_mgr.place_ad_here('A728');</script>
<script type="text/javascript">yld_mgr.place_ad_here('ASPON120');</script>
<script type="text/javascript">yld_mgr.place_ad_here('ROLLOVER');</script>
<script type="text/javascript">yld_mgr.place_ad_here('A300');</script>
<script type="text/javascript">yld_mgr.place_ad_here('Middle1');</script>
<script type="text/javascript">yld_mgr.place_ad_here('B300');</script>

      

The results will look like this:

place_ad_here('A728')
place_ad_here('ASPON120')
place_ad_here('ROLLOVER')
place_ad_here('A300')
place_ad_here('Middle1')
place_ad_here('B300')

      

This is pretty much how I want to display them.

Thanks in advance...

+1


a source to share


3 answers


You are missing the g flag in your Regex. This will allow for multiple matches.

Is this what you want



Array.prototype.push.apply( adBookmarkletData
              , document.innerHTML.match( /place_ad_here\('[^']+'\)/g ) ) ;

      

string.match will return an array of all matches if you use the global flag g . Also, since push only accepts a list of arguments, it is used to pass arguments.

+1


a source


Note that both soitgoes and Laurent recommend or use literal regular expressions (//). Your RegExp doesn't work b / c, you escape the parentheses inside the string passed to the RegExp constructor. You will need to double them.

new RegExp("place_ad_here\\('(.*?)'\\)","g")

      

This is why I prefer a literal regex and only use RexExp when I need to build my regex at runtime.

Plus, Laurent has to accomplish what you want. It just uses a slightly different regex. [^ '] + vs. (. *)? Both should work on the text you are describing.



If you want to keep the output with newlines at the end (1 per line), you can use replace instead of matching and adjust your regex accordingly.

One final note: your matches and / or replacements become more complex if an input like

<script type="text/javascript">yld_mgr.place_ad_here('A728');</script>

spans more than one line, or place_ad_here

ever consumes m ore than one parameter , so make sure you know every possible input. :)

+1


a source


I believe that you will only have it in the first match ... I believe that you need to do something like this.

while ( var match = myRegExp.exec(document.innerHTML)){
   adBookmarkletData.push(match);
}

      

0


a source







All Articles