How do I extract javascript links in an HTML document?

I am writing a small webspider for a site that uses a lot of javascript for links:

<htmlTag onclick="someFunction();">Click here</htmlTag>

      

where the function looks like:

function someFunction() {
  var _url;
  ...
  // _url constructed, maybe with reference to a value in the HTML doc
  // and/or a value passed as argument(s) to this function
  ...
  window.location.href = _url;
}

      

What is the best way to evaluate this function on the server side so that I can plot a value for _url?

0


a source to share


4 answers


You can also use env.js and rhino to actually evaluate JavaScript in html and detect changes in the location object after manually triggering the click event.



+2


a source


Not really sure what you are trying to achieve.



If you need to send these values ​​to the server for processing, Ajax is your best bet.

0


a source


It must be a mess. But it depends on many parameters:

  • Where is the link stored? inside an element, in javascript var, etc.
  • Is the javascript function always your own?

Some hints that might do the trick are to simply parse your html and use a regex to catch http links where onclick = "someFunction ();" the attribute is present.

0


a source


If you need server side processing, you need to either:

  • Do the processing before the content is delivered to the user and include its output in the response, or
  • Use something like AJAX to return a new request to the server
0


a source







All Articles