How to use Wget commands and PHP cURL options for URL retrieval
Please offer advice on the following scenario: I have observed TCP connections to an IP address on our network. The HTTP connections return a file named a.txt, but when I try to retrieve the file via browser, I receive a 404 error code. I don't know the DNS name associated with the IP address (there is no reverse map). What could be the reason that another machine on the network would be able to retrieve a.txt, but I can't do the same? How can I retrieve a.txt?
I think a bit more information might be required before I can make a more informed suggestion. I am assuming the server is a legitimate server hosted on the internal network. I can think of a couple of potential scenarios causing this: Web server-based access control lists (ACLs) or browser-aware content. Web server-based ACLs restrict access to website content based on the IP address of the client initiating the connection. Browser-aware content, on the other hand, may refuse to display a page if a client uses a browser not supported by the site; for example, an Internet Explorer-specific site may refuse the following browser header: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.55 Safari/534.3 (this is basically Google Chrome).
One thing to keep in mind is that the 404 could be a generic message. In other words, the server may not want to reveal the exact reason why the request has been refused. A good way to test this would be to use command-line-based URL retrieval tools like Wget or PHP cURL options. The advantages to this approach would be that they are fully configurable, allowing you to mimic multiple browsers. Wget and cURL can be run on any *nix-based system. When learning how to use Wget commands, a typical CLI that might be run to retrieve a page would be:
wget -U Mozilla http:///a.txt
Basically this mimics a Mozilla User-Agent and tries to retrieve a.txt.
This was first published in July 2010