An alternative to innerhtml that includes a header?
I am trying to fetch data from the following page:
Which, conveniently and inefficiently, includes all the data embedded in the csv file in the header, set as the gs_csv variable.
How to extract it? Document.body.innerhtml
skips the header, where is the data, what is the alternative that includes the header (or better yet, the value associated with gs_csv
)?
(Sorry new to all of this, I've searched a lot of documentation and tried a lot, but nothing worked so far).
Thanks to Sinan (this is basically his solution, transcribed to Python).
import win32com.client
import time
import os
import os.path
ie = Dispatch("InternetExplorer.Application")
ie.Visible=False
ie.Navigate("http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37#")
time.sleep(20)
webpage=ie.document.body.innerHTML
s1=ie.document.scripts(1).text
s1=s1[s1.find("gs_csv")+8:-11]
scriptfilepath="c:\FO Share\bmreports\script.txt"
scriptfile = open(scriptfilepath, 'wb')
scriptfile.write(s1.replace('\n','\n'))
scriptfile.close()
ie.quit
a source to share
Unconfirmed: Have you tried to find what Document.scripts contains?
UPDATE:
For some reason I am having enormous difficulty getting this to work with Windows Scripting Host (but then I don't use it often, apologies). Anyway, here is the Perl source that works:
use strict;
use warnings;
use Win32::OLE;
$Win32::OLE::Warn = 3;
my $ie = get_ie();
$ie->{Visible} = 1;
$ie->Navigate(
'http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?'
.'param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37#'
);
sleep 1 until is_ready( $ie );
my $scripts = $ie->Document->{scripts};
for my $script (in $scripts ) {
print $script->text;
}
sub is_ready { $_[0]->{ReadyState} == 4 }
sub get_ie {
Win32::OLE->new('InternetExplorer.Application',
sub { $_[0] and $_[0]->Quit },
);
}
__END__
C:\Temp> ie > output
output
now contains all script tags.
a source to share
select the origin of this page using ajax and parse the response text like XML using jquery. It should be simple enough to get the text of the first tag you encounter inside
I am not aware of jquery, or I would post code examples.
EDIT: I am assuming you are talking about client side csv fetch.
a source to share
Thanks to Sinan (this is basically his solution, transcribed to Python).
import win32com.client
import os import time
import os.path
ie = Dispatch ("InternetExplorer.Application") ie.Visible = False
ie.Navigate (" http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37# ")
time.sleep (20)
webpage = ie.document.body.innerHTML
s1 = ie.document.scripts (1) .text s1 = s1 [s1.find ("gs_csv") + 8: -11]
scriptfilepath = "c: \ FO Share \ bmreports \ script.txt"
scriptfile = open (scriptfilepath, 'wb')
scriptfile.write (s1.replace ('\ n', '\ n'))
scriptfile.close ()
ie.quit
a source to share