Curl Redirect: Why FOLLOWLOCATION doesn't follow right DOMAIN

asked Sep 5, 2010 by user3042509

I'm trying to scrape a website but the page I tried to scrape contains a redirect to another page.I put FOLLOWLOCATION parameter on curl but I arrive on a url http://localhost/....pageredirected.php and so on

The problem is that redirect works but DOMAIN is not right (because it is mine not scraped page). Here is code:

// create a new CURL resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);    

// grab URL and pass it to the browser
$esito = curl_exec($ch);
echo $esito;
// close CURL resource, and free up system resources

page will be redirect is etape1.cfm TO etape2.cfm but I get 404 Error because I see http://localhost/scraping/etape2.cfm?... and not

Why FOLLOWLOCATION doesn't follow right DOMAIN ( ?

1 Answer

answered Sep 5, 2010 by marc-b

The problem isn't curl. Part of what that first url sends is this:

<script language="JavaScript" type="text/javascript">

    function historyDeleteAndRedirect()




Since you're not accessing the site in a normal manner, this javascript breaks, as you're really hitting "localhost" rather than "". Remember, curl works on the server. So you're hitting "http://localhost/etape1.cfm?...... Since the .replace() isn't an absolute URL, your browser is doing the correct thing and re-using localhost.

