Thứ Năm, 3 tháng 5, 2012

How do I read the contents of a remote web page?

You can include static txt and HTML files from remote servers by using a component (such as AspHTTP, ASPTear 1.50, or VB's built in InetCtrls) to parse the remote URL's content.

You can also try this method out; it was tested with the MSXML objects which are installed with Windows 2000. You should make sure you have the latest versions of MSXML and XML Core Services (see MSXML Downloads). If you download the newer version, take special note of the new ProgID you should be using -- MSXML 4.0 now supports side-by-side installation, which means the ProgID below will actually use the older version.

<%
    url = "http://www.espn.com/main.html"
    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    xmlhttp.open "GET", url, false
    xmlhttp.send ""
    Response.write xmlhttp.responseText
    set xmlhttp = nothing
%>

And here it is in JavaScript:


If you use a URL that doesn't exist, or you are behind a firewall that blocks certain web sites, or the site is behind a firewall that blocks traffic to port 80 / 443, or you are using a proxy server, or the site requires authentication, you will receive this error:

msxml4.dll (0x80072EE7)
Server name or address could not be resolved

To correct, you will have to figure out which of the issue(s) is standing in your way, and discuss workarounds with your or their network administrator(s).

Don't forget that if your remote page has relative image URLs, or style sheets, or JavaScript files, or frames, or links, it won't work perfectly when ported to your server(s). To overcome this, you'll want to add a BASE HREF tag to keep all the images coming from the correct location. For example, the above code (which gets all the text from espn.com, but is formatted weird and doesn't function 100% as intended), is modified only slightly to work correctly:

<%
    url = "http://www.espn.com/main.html"

    ' add a BASE HREF tag
    Response.write ""

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    xmlhttp.open "GET", url, false
    xmlhttp.send ""
    Response.write xmlhttp.responseText
    set xmlhttp = nothing
%>

For information on increasing or decreasing the time allowed for the XMLHTTP objects to retrieve a response from a remote server, see Article #2407.

If you need to POST data you can so by adding a header that tells the receiver you're sending FORM data:

<%
    url = "http://www.espn.com/main.html"
    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    xmlhttp.open "POST", url, false
    xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    xmlhttp.send "x=1&y=2"
    Response.write xmlhttp.responseText
    set xmlhttp = nothing
%>

Another thing you may want to do, going back to the original script, is make sure the server is there! If not, you can display a message... and you can customize it to display whether the server was not found at all, or if the server was found but you got a bad response (e.g. a 404 Page Not Found). Note that if you do not need to parse the content of the remote web page, that using the HEAD method here is far more efficient than using GET or POST... since only the headers are retrieved from the remote server, not any of the content.

<%
    ' deliberate typo:
    url = "http://www.espn.co/main.html"

    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    on error resume next
    xmlhttp.open "HEAD", url, false
    xmlhttp.send ""
    status = xmlhttp.status
    if err.number <> 0 or status <> 200 then
        if status = 404 then
            Response.Write "Page does not exist (404)."
        elseif status >= 401 and status < 402 then
            Response.Write "Access denied (401)."
        elseif status >= 500 and status <= 600 then
            Response.Write "500 Internal Server Error on remote site."
        else
            Response.write "Server is down or does not exist."
        end if
    else
        Response.Write "Server is up and URL is available."
    end if
    set xmlhttp = nothing
%>

You might want to parse the results, instead of sending them straight to the client:

<%
    url = "http://www.espn.com/main.html"
    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    on error resume next
    xmlhttp.open "GET", url, false
    xmlhttp.send ""
    if err.number <> 0 then
        response.write "Url not found"
    else
        if instr(xmlhttp.responseText,"Stanley Cup")>0 then
            response.write "There's a story about the playoffs."
            response.write "Go there?"
        else
            response.write "There is no story about the playoffs."
        end if
    end if
    set xmlhttp = nothing
%>

You may be interested in performing an asynchronous request, e.g. hitting an ASP page that acts like a batch file that gets fired but does not need to return any results. You can simply change the third parameter of the open call to TRUE (and leave out the reference to the responseText value):

<%
    url = "http://www.espn.com/main.html"
    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    xmlhttp.open "GET", url, true
    xmlhttp.send ""
    set xmlhttp = nothing
%>

Finally, you may want to spoof your user agent, since the MSXML object sends something like "Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)" -- many sites will view this as a spider or 'screen scraper', and for various reasons, might present alternate content -- here are two samples:

<%
    url = "http://www.espn.com/main.html"


    ' this sample posts as the actual browser being used:


    br = Request.ServerVariables("HTTP_USER_AGENT")
    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    on error resume next
    xmlhttp.open "GET", url, false
    xmlhttp.setRequestHeader "User-Agent",br
    xmlhttp.send ""
    if err.number <> 0 then
        response.write "Url not found"
    else
        response.write xmlhttp.responseText
    end if
    set xmlhttp = nothing



    ' this sample posts as "My funky browser."


    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    on error resume next
    xmlhttp.open "GET", url, false
    xmlhttp.setRequestHeader "User-Agent","My funky browser."
    xmlhttp.send ""
    if err.number <> 0 then
        response.write "Url not found"
    else
        response.write xmlhttp.responseText
    end if
    set xmlhttp = nothing
%>



If you encounter errors... you can use ParseError to determine the problem.

<%
    set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
    ' ... stuff ...
    on error resume next
    xmlhttp.send ""
    if err.number <> 0 then
        response.write "Error: " & xmlhttp.parseError.URL & _
            "
" & xmlhttp.parseError.Reason
        response.end
    end if
    ' ... stuff ...
%>

A common error you might receive:

msxml3.dll error '80072efd'
A connection with the server could not be established

Make sure that the URL is actually reachable. You may have spelled the domain name wrong, or the site may actually be down.

Test using a browser from that machine, or simply running a tracert / ping. Note that ping won't always return results, because many sites block all such traffic (mainly to help eliminate DOS attacks). However, ping should at least let you know the IP address, which means that the domain name was resolved correctly through DNS. Otherwise, it might be that your DNS server is preventing connection.
(http://classicasp.aspfaq.com/general/how-do-i-read-the-contents-of-a-remote-web-page.html)

Không có nhận xét nào:

Đăng nhận xét