An alternative to innerhtml that includes a header?

I am trying to fetch data from the following page:

http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37#

Which, conveniently and inefficiently, includes all the data embedded in the csv file in the header, set as the gs_csv variable.

How to extract it? Document.body.innerhtml

skips the header, where is the data, what is the alternative that includes the header (or better yet, the value associated with gs_csv

)?

(Sorry new to all of this, I've searched a lot of documentation and tried a lot, but nothing worked so far).


Thanks to Sinan (this is basically his solution, transcribed to Python).

import win32com.client 

import time 

import os 

import os.path

ie = Dispatch("InternetExplorer.Application") 

ie.Visible=False 

ie.Navigate("http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=&param2=&param3=&param4=&param5=2009-04-22&param6=37#")

time.sleep(20)

webpage=ie.document.body.innerHTML

s1=ie.document.scripts(1).text 

s1=s1[s1.find("gs_csv")+8:-11]

scriptfilepath="c:\FO Share\bmreports\script.txt" 

scriptfile = open(scriptfilepath, 'wb') 

scriptfile.write(s1.replace('\n','\n')) 

scriptfile.close()

ie.quit

      

0


a source to share


4 answers


Unconfirmed: Have you tried to find what Document.scripts contains?

UPDATE:

For some reason I am having enormous difficulty getting this to work with Windows Scripting Host (but then I don't use it often, apologies). Anyway, here is the Perl source that works:



use strict;
use warnings;

use Win32::OLE;
$Win32::OLE::Warn = 3;

my $ie = get_ie();

$ie->{Visible} = 1;

$ie->Navigate(
    'http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?'
    .'param1=&param2=&param3=&param4=&param5=2009-04-22&param6=37#'
);

sleep 1 until is_ready( $ie );

my $scripts = $ie->Document->{scripts};

for my $script (in $scripts ) {
    print $script->text;
}

sub is_ready { $_[0]->{ReadyState} == 4 }

sub get_ie {
    Win32::OLE->new('InternetExplorer.Application', 
        sub { $_[0] and $_[0]->Quit },
    );
}

__END__

C:\Temp> ie > output

      

output

now contains all script tags.

+1


a source


select the origin of this page using ajax and parse the response text like XML using jquery. It should be simple enough to get the text of the first tag you encounter inside

I am not aware of jquery, or I would post code examples.



EDIT: I am assuming you are talking about client side csv fetch.

0


a source


If this is just one of the script, then exctracting this csv data is simple:

import urllib2

response = urllib2.urlopen('http://www.bmreports.com/foo?bar?')
html = response.read()
csv = data.split('gs_csv=')[1].split('</SCRIPT>')[0]

#process csv data here

      

0


a source


Thanks to Sinan (this is basically his solution, transcribed to Python).

import win32com.client

import os import time

import os.path

ie = Dispatch ("InternetExplorer.Application") ie.Visible = False

ie.Navigate (" http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37# ")

time.sleep (20)

webpage = ie.document.body.innerHTML

s1 = ie.document.scripts (1) .text s1 = s1 [s1.find ("gs_csv") + 8: -11]

scriptfilepath = "c: \ FO Share \ bmreports \ script.txt"

scriptfile = open (scriptfilepath, 'wb')

scriptfile.write (s1.replace ('\ n', '\ n'))

scriptfile.close ()

ie.quit

0


a source







All Articles