HTTP Headers and the PHP header() Function

Introduction

Many beginning and intermediate PHP programmers seem to think the header() function
is some kind of strange voodoo. They work from examples and code snippets and are able to get things done with it,
but they don’t know quite how it works. That was certainly the
way I regarded header() the first time I saw it.

In reality, it’s quite simple. In this tutorial, I’ll explain a little about how HTTP headers work, how they relate to PHP,
and a little about their meta tag equivalents.

Hopefully by the time you’re done reading this, you’ll feel more confident about how to
use the header() function, and even have some new ideas about how it can help you. We’ll also cover some other important
topics related to HTTP headers and PHP. Before we talk about any programming
at all, though, we need to quickly (and incompletely) go over how HTTP (HyperText Transfer Protocol) works in general.

HTTP Overview

Headers: words in a conversation

HTTP is the protocol (the set of ‘rules’) for transferring
data (e.g. HTML in web pages, pictures, files) between web servers and
client browsers, and usually takes place on port 80.
This is where the ‘http://‘ in website URLs comes from.

The first time most people make a web page, they write the HTML on their computer, view it locally in a browser,
upload it to their server, and view it on the web. It might seem like viewing a page locally and viewing it on the server
is exactly the same, and that the only data going back and forth between the server and the
browser is the HTML and any images included in the page. But there is actually a lot of other information that you do not
see when you view a file on the web — the headers.

Headers can be separated into two broad types: Request headers that your browser sends to the server when you request a file, and
Response headers that the server sends to the browser when it serves the file. Think of these headers as the words in a
conversation taking place between the browser and the server. I like to imagine the server as a librarian, and the browser as a
researcher asking for a library resource. The browser walks up to the server at the main desk (port 80) and says
something like, “Hi, my name
is Mozilla, and I’m looking for the resource with the call number ‘www.expertsrt.com’. Can you get it for me?” The server listens, and responds “Yes, I found it, let me send it to you. The data in the item is HTML text, and it says ‘<html>…'” The browser reads through, and comes to an image tag, and asks the server for item with the location in the src attribute. The server looks, finds the file and says “This file is a PNG image, and the data is….” You get the idea.

Another conversation might go like this:

Browser: Hi, I’m Mozilla, can I have the file at ‘www.expertsrt.com/moved.html’?
Server: That file is no longer there, it is at ‘www.expertsrt.com/newloc.html’.
Browser: Hi, I’m Mozilla, can I have the file at ‘www.expertsrt.com/newloc.html’?
Server: I found the file. Look at it for 10 seconds and then ask me again. It’s HTML text and it reads….
…10 seconds…
Browser:> Hi, I’m Mozilla, can I have the file at ‘www.expertsrt.com/newloc.html’?
Server: I found the file. Look at it for 10 seconds and then ask me again. It’s HTML text and it reads….
…10 seconds…
Browser: Hi, I’m Mozilla, can I have the file at ‘www.expertsrt.com/newloc.html’?
Server: I found the file. Look at it for 10 seconds and then ask me again. It’s HTML text and it reads….
….and so on, until the browser is redirected by the user….

As you can see, there is a lot going on that headers control. Using the header() function, you can make the
server send any headers that you need want, which allows you to do some really cool things beyond just sending plain old HTML.

Seeing the whole conversation

Before moving ahead, let’s get a better idea of how HTTP headers work by viewing a webpage without a browser, so we can
see the converation in is entirety. Start by opening a command prompt (in windows, go to Start->Run, type cmd, and click “OK”…if you’re using linux you probably already know). At the prompt, type:

telnet expertsrt.com 80

and press Enter. This will connect you to expertsrt.com on port 80. Next, copy and paste just the text below:

GET / HTTP/1.1
Host: expertsrt.com

Don’t worry if when
you type or paste the text, it does not show up in your command window and all you see is the cursor — it is indeed being sent to the server. The first line says you are using the GET request method to get the resource /
(i.e. the file in the base directory of the host), and that you are using HTTP version 1.1. The second tells the server which host
you want to connect to. When you finish typing ‘expertsrt.com’, hit Enter twice (and twice only). You should almost immediately get a response that looks like:

HTTP/1.1 301 Moved Permanently
Date: Wed, 08 Feb 2006 07:44:07 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) mod_auth_pgsql/2.0.2b1 mod_ssl/2.0.54 OpenSSL/0.9.7e
Location: http://www.expertsrt.com/
Content-Length: 233
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.expertsrt.com/">here</a>.</p>
</body></html>

Whoops! Looks like we requested a resource that wasn’t there; it’s been permanently moved to the new Location
http://www.expertsrt.com. If you were using a browser, you’d only see the HTML — everything before the first blank
line is the headers. In fact, modern browsers are even smarter than that — when they see the Location header on the
third line, they automatically go there so you don’t have to type in a new URL. Let’s go to the new URL. By this point, you
probably got disconnected while you were reading this. If so, just press your up arrow on the keyboard to get your telnet command back, and press enter to reconnect. If you’re still connected, you can just go ahead and type the following:

GET / HTTP/1.1
Host: www.expertsrt.com

and press Enter twice after the second line. You’ll get another similar response telling you that the page is actually at
http://www.expertsrt.com/index.php. The server is particular, isn’t it? 😉 Repeat the above, but this time type

GET /index.php HTTP/1.1
Host: www.expertsrt.com

Notice that the name of the file we want is in the first line. This time we get flooded with text: the HTML from ERT’s homepage.
The headers look like

HTTP/1.1 200 OK
Date: Wed, 08 Feb 2006 08:20:07 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) mod_auth_pgsql/2.0.2b1 mod_ssl/2.0.54 OpenSSL/0.9.7e
X-Powered-By: PHP/4.4.0
Transfer-Encoding: chunked
Content-Type: text/html

Simple, no?. Let’s move forward and see how this relates to your programming.
Don’t worry if you didn’t understand every single thing
that we just did. The important thing is to have a general feel for how the browser and server talk to each other,
and to realize that there is nothing magic about it. The take home points are:

  • The browser and the server talk to each other using headers
  • Headers are sent before the main content, and are separated from the main content by a a
    double-CRLF/newline.
  • In the header section, there is one header per line. The name of the header comes first, followed by a colon and a space, followed by the content/value of the header:
    Header-Name: header-value
  • Headers can contain many types of information and instructions that the server and browser use to help each other know
    what to do next

Note: If you’re the type who likes to really dig into the details, you can look at
RFC 2616 for the complete HTTP/1.1 specification in all its glory.
In particular, Section 14 offers a complete
definition for each header field.

PHP header(): The Basics

Notice the response headers X-Powered-By: PHP/4.4.0 and Content-Type: text/html that were
returned when we
finally got to the homepage. PHP was designed from the beginning to output HTML (the ‘H’ in PHP stands for ‘Hypertext’), and
the first time a script generates output (e.g. by using echo), PHP automatically includes those headers for you. This is
very convenient, but also contributes to the confusion many PHP beginners have regarding headers — in more ‘bare bones’
languages like Perl that were not originally designed for the web, sending output without including your own headers produces
the dreaded ‘500 Internal Server Error’, so Perl web
programmers have no choice but to learn about headers immediately.

The header() function sends HTTP response headers; nothing
more, nothing less.


Using this function, you can make your scripts send
headers of your choosing to the browser, and create some very useful and dynamic results. However, the first thing you need to know about the
header() function is that you have to use it before PHP has sent any output (and therefore its default headers).

I doubt there is a PHP programmer in the world who has never seen an error that looks like

Warning: Cannot modify header information – headers already sent by…..

As we said above, the response headers are separated from the content by a blank line. This means you can only send them once, and if
your script has any output (even a blank line or space before your opening <?php tag), PHP does so without asking
you. For example, consider the script below, which seems logical enough:


Welcome to my website!<br />

<?php

  
if($test){

   echo 
"You're in!";

  }

  else{

    
header('Location: http://www.mysite.com/someotherpage.php');

  }

?>

What this script is trying to do is redirect the visitor using the Location header if
$test is not true. Do you see the problem? The ‘Welcome…’ text gets sent no matter what, so the headers are
automatically sent. By the time header() is called, it’s already too late: instead of getting redirected,
the user will just see an error message (or if you have error reporting off, nothing but the ‘Welcome…’ text).

There are basically two solutions to this. The first is to rewrite the code


<?php

  
if($test){

   echo 
'Welcome to my website<br />You're in!';

  }

  else{

    
header('Location: http://www.mysite.com/someotherpage.php');

  }

?>

The second is output buffering, which can be somewhat more elegant and easy to use.
In our example above, rewriting the code wasn’t much trouble, but imagine if there had been quite a bit of
HTML to move around — it could be pretty cumbersome, and it might make our code harder to follow. While our first example caused an error, the logic of the program was fine. Output buffering allows you
to hold on to (‘buffer’) output (even HTML outside of PHP code tags) and send it to the browser only when you explicitly say to do
so. This way you can program however you would like to, and explicitly send the output after you’ve specified any headers you need to. The two relevant functiosns are
ob_start(), which turns output buffering on, and
ob_flush(), which sends the content that has accumulated
in the buffer:


<?php 

 ob_start
();  //begin buffering the output 

?>

Welcome to my website!

<?php

  if(true){

   echo 
"You're in!";

  }

  else{

    
header('Location: http://www.mysite.com/someotherpage.php');

  }

  ob_flush(); //output the data in the buffer

?>

I encourage you to read more about all of the output buffering functions, which can be quite useful. You should flush the output
buffer as soon as possible, especially if you have quite a bit of content to send. Otherwise, your page will appear to load
slower, becuase the content will be sent only after it has been entirely assembled, rather than as it is available.

Note: The 2nd argument If you call header() more than once for the same header field, the value for that header will
be the one included in the last call you made. For example,


<?php

header
('Some-Header: Value-1');

header('Some-Header: Value-2');

?>

would produce the header Some-Header: Value-2. You can cause both headers to be sent by using the second replace argument
for header, which is true by default. If you set this to false, the second header value will not replace the first,
and both will be sent. So the code


<?php

header
('Some-Header: Value-1');

header('Some-Header: Value-2'false); //don't replace the first value

?> 

will produce the header Some-Header: Value-1, Value-2. You will rarely need this, but is good to know.

Armed with a good understanding of how HTTP headers and PHP work together, let’s look at some specific examples of using this
functionality.

PHP header(): Some Examples

Note: The code snippets appearing below are just that: snippets from
complete working code. When you you include them in your own programs, remember to define all your variables,
assign default values, and adhere to other good programming practices.

Redirecting with the Location header

We’ve seen this one a couple times above: it redirects the browser.


<?php

header
('Location: http/www.mysite.com/new_location.html');

?>

While you can somtimes get away with supplying a relative URL for the value, according to the HTTP specification, you should
really use an absolute URL.

One mistake that is easy to make with the Location header is not calling
exit directly afterwards (you may not always want to do
this, but usually you do). The reason this is a mistake is that the PHP code of the page continues to execute even though the user
has gone to a new location. In the best case, this uses system resources unnecessarily. In the worst case, you may perform tasks that
you never meant to. Consider the code below:


<?php 

//Redirect users with access level below 4

if (check_access_level($username) < 4){

  header('Location: http://www.mysite.com/someotherpage.php');

}

//Mail users with higher access level the secret code

mail_secret_code($username);

echo 'The secret email is on its way!';

?>

Unauthorized users are indeed redirected, but in fact, they too will receive the email, because the script continues to run.
To avoid this, the part for authorized users could be wrapped in an else{} statement, but it is cleaner and easier
to call exit immediately after the header command to end the execution of the script:


<?php 

//Redirect users with access level below 4

if (check_access_level($username) < 4){

  header('Location: http://www.mysite.com/someotherpage.php');

  exit; 
//stop script execution

}

//Mail users with higher access level the secret code

mail_secret_code($username);

echo 
'The secret email is on its way!';

?>

Redirecting with the Refresh header

The Refresh redirects users like the Location header does, but you can add a delay before the user
is redirected. For example, the following code would redirect the user to a new page after displaying the current one for 10
seconds:


<?php 

header
('Refresh: 10; url=http://www.mysite.com/otherpage.php');

echo 
'You will be redirected in 10 seconds';

?>

Another common application is to force a page to update repeatedly by ‘redirecting’ to the current page (see the second
‘conversation’ above). For example, here is a simple page that will ‘count’ down from 10, with a 3 second
pause between numbers:


<?php 

if(!isset($_GET['n'])){

    
$_GET['n'] = 10;

}

if($_GET['n'] > 0){

  
header('Refresh: 3; url=' $_SERVER['PHP_SELF'].'?n=' . ($_GET['n']-1)  );

  echo $_GET['n'];

}

else{

  echo 
'BLAST OFF!';

}

?>

Note: If the refresh time is set to 0, then the Refresh header is
effectively the same as the Location header.

Serving different types of files and generating dynamic content using the Content-Type header

The Content-Type header tells the browser what type of data the server is about to send. Using this header, you can
have your PHP scripts output anything from plain text files to images or zip files. The table below lists frequently-used
MIME types:

You can do several interesting things with this. For example, perhaps you want to send the user a pre-formatted text file
rather than HTML:


<?php 

header
('Content-Type: text/plain');

echo 
$plain_text_content;

?>

Or perhaps you’d like to prompt the user to download the file, rather than viewing it in the browser. With the help of the
Content-Disposition header, it’s easy to do, and you can even suggest a file name for the user to use:


<?php 

header
('Content-Type: application/octet-stream');

header('Content-Disposition: attachment; '

       
.'filename="plain_text_file.txt"');

echo 
$plain_text_content;

?>

Maybe you need to serve a file for download, but you’d like to obscure its true location and name, and only serve it to users
who are logged in:


<?php 

if($b_is_logged_in){

   
header('Content-Type: application/octet-stream');

   
header('Content-Disposition: attachment; ' 

           .'filename="'.$different_filename.'"');

   
readfile('/path/to/files/' $filename);

}

else{

   echo 
'You are not authorized to view this file';

}

?>

Perhaps you’ve dynamically generated an image using PHP’s image functions and you want to display it to the user. You could create a file build_image.php like this

Common MIME types
Type Description
text/html HTML (PHP default)
text/plain Plain Text
image/gif GIF Image
image/jpeg JPEG Image
image/png PNG Image
video/mpeg MPEG Video
audio/wav WAV Audio
audio/mpeg MP3 Audio
video/mov
video/quicktime
Quicktime Video
video/x-ms-wmv Windows WMV video
audio/x-ms-wma Windows WMA audio
audio/x-realaudio RealPlayer Audio/Video (.rm)
audio/x-pn-realaudio RealPlayer Audio/Video (.ram)
video/x-msvideo
video/avi
AVI Video
application/pdf PDF Document
application/msword MS Word .doc file
application/zip Zip File
application/octet-stream Misc. data. Use to force download or open with application.*
x-foo/x-bar Misc. data. Use to force download ot open with application.*

<?php 

   
//build the image above

   
header('Content-Type: image/jpeg');

   
imagejpeg($image_resouce);

?>


Note: Beware of magic_quotes!
PHP’s automatic escaping of special characters with a backslash may seem like a good idea at first, but most good programmers
generally agree that it (a) encourages sloppy programming that does not validate input and (b) causes
annoyances in well-written code that would not occur if “magic quoting” were turned off. One such annoyance is
the corruption of binary data. In the example above, if
magic_quotes_runtime
is on, the data that readfile() outputs may have backslashes added to it, thus
corrupting the file that is sent to the user. Ideally, you should turn magic_quotes_runtime off in your
php.ini file to avoid this, but if you do not have access to the configuration file, you can also use the
set_magic_quotes_runtime() function
(pass is the 0 (zero) integer) to turn the setting off.

Happily, the minutes of a recent
PHP Developer meeting show that they have decided to abandon magic quotes in future versions (6+) of PHP. Until
everyone upgrades, however, keeping the problems this feature can cause in mind can save you quite a bit of
trouble and frustration.

You might pass the parameters necessary to generate the image via the URL so you can access them in the $_GET array.
Then in another page, you might include this image using an img tag:


<img src="build_image.php<?php echo "?$user_id&amp;$caption"?>">

The possibilities are more or less endless. The more PHP programming you do, the more you will find that the Content-Type
header truly is your friend.

Note: The way that browser are supposed to handle content of various MIME types, and the way they actually do
may not always be consistent (especially with Internet Explorer), so you’re well-advised to test your pages in the browsers
you need to support to make sure they behave as expected. The PHP Manual has many helpful tips in the
user-contributed
comments
on the header() page.

Preventing Page Caching

PHP pages often generate very dynamic content, and to prevent users from missing updates by viewing cached pages, it is
often helpful to be able to tell browsers not to cache certain pages. The following snippet works quite well on the
browsers that are likely to visit your site:


<?php

header
('Cache-Control: no-cache, no-store, must-revalidate'); //HTTP/1.1

header('Expires: Sun, 01 Jul 2005 00:00:00 GMT');

header('Pragma: no-cache'); //HTTP/1.0

?> 

The Expires header can be any date in the past. As with MIME types, browsers (especially older ones) may not
always listen properly to your caching instructions (although most modern ones will).

Other Applications

There are other ways you can use headers as well, such as setting the
HTTP Response Code, or in performing
HTTP Authentication (if you are running PHP as an Apache module).
Now that you understand how header() works and how to use it, you’ll be able to do all sorts of things you
might not have thought of before.

Request Headers in PHP

We’ve covered some of the things you can do with response headers above. We can also get a great deal of information
from the request headers received by the server from the browser. There are two ways to access these. First, many of the
values in the $_SERVER array are determined from the
request headers. Second, if PHP is installed as an Apache module, then
apache_request_headers() will return an
array of all request headers (even those not in $_SERVER).

Security first: don’t trust request headers

Since request headers are set by the browser, which is controlled by the client, you must never trust request
headers for information that is important to the security of your site
. A good example is the
$_SERVER['HTTP_REFERER'] variable, which should hold the URL of the page that referred the
user to the current one. A common mistake among beginners is to think that they can use this to make sure
that users only access pages through a certain path, and that they therefore do not need to
worry about server side data validation. For example,
consider this code, which attempts to make sure that data has been submitted from a specific page, rather
than a custom form on another website:


<?php

 
if($_SERVER['HTTP_REFERER'] != 'http://www.mysite.com/myform.html'){

   
header('Refresh: 5; url=http://www.mysite.com/myform.html');

   echo 
'You must use the form on my site...redirecting now.';

 }

 else{

   insert_data($_POST['var1'], $_POST['var2']);

 }

?> 

This might work to deter an unsophisticated hacker who is using his web browser to submit data through a custom form, but someone
who is a little more savvy could easily submit data via a telnet session like we did above, including the request header

and easily defeat this ‘protection’. The moral of the story is: use HTTP request headers to gather statistics and to help make
the user experience more pleasant — most request headers you receive will be supplied by standard browsers and will be
entirely truthful…But do not rely on request headers for any issues pertaining to security.

Using HTTP request headers

There are several things you can do with these. Using $_SERVER['HTTP_USER_AGENT'] you can detect the type of browser
the user says it has. You might check the $_SERVER['HTTP_ACCEPT_LANGUAGE'] (perhaps along with $_SERVER['HTTP_ACCEPT_CHARSET'] and some
IP address geolocation) to help determine the
best language in which to serve your pages to a given user.
Although $_SERVER['HTTP_REFERER'] is not reliable for security
purposes, it could be useful as an aid for building statistics about your website traffic or customizing content to
match the path the user took to reach a given page. If for some reason you want to manipulate the raw query string used when
the page was accessed, you can look in $_SERVER['QUERY_STRING']. Looking in $_SERVER['REQUEST_METHOD'] will
tell you whether your page was accessed via GET or POST. There’s quite a bit of information there for
you to find creative uses for.

HTML Meta Tag HTTP Header Equivalents

Chances are, before reading this article, you have seen or used the HTML meta tag below to redirect a user:

<meta http-equiv="refresh" content="0;http://www.mysite.com/somepage.html" />

Look familiar? The ‘http-equiv’ meta tags are ‘equivalent’ to HTTP response headers, and were introduced so that people
writing HTML pages without server side programming would have access to the powerful functionality described above. Using these
meta tags is simple: they can be placed anywhere in the <head> of the document, and their http-equiv
attribute contains the header name, while the content attribute contains the value for the header.

I’ve found that these, like the HTTP headers in general, often produce confusion, but now they should seem quite simple to you.
Although I usually prefer to use the PHP header() function, these meta tag HTTP header equivalents are often very handy
for things like specifying the character set. For example, I often use this is my HTML pages (and sometimes my PHP ones):

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Note: Support for HTTP headers as equivalet meta tags is not uniformally supported, so it is usually safer and
faster to use the headers themselves if you can. Also, it should be obvious that some headers and values will not work as meta
equivalents: you cannot set the Content-Type to image/png when the real headers have been sent and the
browser is already reading the HTML 😉

Conclusion

Now that you are done with this article, you should have a pretty firm grasp of how HTTP works, how request and response headers
are used, and how you can employ this functionality in your programming. This reasonably detailed knowledge should also enable
you to start thinking more critically about your web application efficiency and security. I hope that as you move forward with your
programming, you will find that you’ve become quite comfortable working with HTTP headers, and that you are able to exploit them to
make your job easier and your pages better.

As a parting thought, remember that headers are like words: they convey information and ask for certain actions to be performed,
but by themselves they don’t force anything to happen. 99.9% of the time, cooperative browsers are talking to cooperative servers,
and everything happens smoothly. But you have to remember that, as in life, every once in a while you’ll run across a jerk
(a hacker), or someone who’s got his own way of doing things (Internet Explorer). Web development is very much a job of customer
service, so you’ve got to do your best to keep the crooks out, and accomodate the customers with ‘special needs.’ 😉

Php curl tutorial – making http requests in php

Curl

According to the official website.

curl is a command line tool for transferring data with URL syntax, supporting DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos…), file transfer resume, proxy tunneling and a busload of other useful tricks.

Curl is not only a commandline program but is also integrated into other languages like a library. For example in php. Php has an curl extension that lends all features of the curl program to php as a programmable api. There are few functions to learn and many options to know about, and then any php program can use curl to do many wonderful things. And this is precisely what we shall be doing in this article, to learn how to use curl.

To give a brief description about what it can do, curl can be used to download contents of remote urls, download remote files, submit forms automatically from scripts etc. Although these are the most common uses of the curl library in php, curl is no limited to these things itself and can do a lot more as specified in the definition above.

That much being for the introduction, lets get into it without any more delay.

Make GET requests – fetch a url

Fetching a remote url is the same as performing a GET request on a url. This action gets the html contents of the url. Lets have a look at the program that does that and understand its working.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
//gets the data from a URL
function get_url($url)
{
    $ch = curl_init();
     
    if($ch === false)
    {
        die('Failed to create curl object');
    }
     
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
 
echo get_url('http://www.apple.com/');

Open that script in your browser and it should show the contents of apple.com as expected. The get_url function takes a url as parameter and fetches its content using curl functions. The curl function is in the following sequence :

1. create a curl object using curl_init() function.
If the curl extension is not installed, then this will return false and php would throw a fatal error saying "Call to undefined function curl_init()"

2. Set the necessary curl options/parameter using the curl_setopt function.
3. Execute the curl request by calling curl_exec
4. Close the curl object by calling curl_close

Step 2 is the most important where the correct options need to be provided to curl so that it can perform the request properly and fetch the results. Lets take a look at some of the basic options used in most requests.

CURLOPT_URL – This is the url to which the request is being send. In case of post requests it is the url to which the data is being submitted or posted.

CURLOPT_RETURNTRANSFER – This will return the result output as a string. Without this the output would be directly echoed to the screen or STDOUT.

So in case of a GET request the URL is the most important option to set and the RETURNTRANSFER option returns the output in a proper variable.

For a list of all options that php-curl supports check the following page
http://php.net/manual/en/function.curl-setopt.php

Note

The above GET request to a url can be done in a much simpler way like this

1
2
// Make a HTTP GET request and print it (requires allow_url_fopen to be enabled)
echo file_get_contents('http://www.apple.com/');

The file_get_contents function can be used to fetch the contents of a url very much like it does for local files on the storage. So in most cases you may want to use this shorter method for fetching urls, instead of the lengthy curl call.

Setting all curl options at once

Calling the curl_setopt function again and again to set the options is a bit tedious. There is a useful function called curl_setopt_array that takes an array of options and sets them all at once. Here is a quick example

1
2
3
4
5
curl_setopt_array($ch, array(
    CURLOPT_URL => $url ,
    CURLOPT_RETURNTRANSFER => 1,
    CURLOPT_CONNECTTIMEOUT => $timeout ,
));

Make POST requests – submitting forms

Now that we have learned how to make basic GET requests using CURL, its time to make some POST requests. POST requests are mostly use to submit data to a url, like forms. For example the login form, signup form that you see on other websites do a POST request to submit the data.

Lets take a simple program as an example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
    POST request in PHP using Curl
*/
 
$ch = curl_init();
     
if($ch === false)
{
    die('Failed to create curl object');
}
 
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_POST, true);
 
// The submitted form data, encoded as query-string-style name-value pairs
$post_data = 'name=Harry&age=25';
curl_setopt ($ch, CURLOPT_POSTFIELDS, $post_data);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
 
$output = curl_exec($ch);
curl_close ($ch);
 
echo $output;

The above program submits the data ‘name=Harry&age=25’ to the $url. Test the submission by creating a curl_submit.php in localhost directory and doing a print_r($_POST). It will show what data was submitted.

The important options to set in a POST request are POST=true and POSTFIELDS=post_data. The first option tells curl that we want to make a POST HTTP request and not a GET request. The second parameter provides the data for the POST request.

Array syntax for post data

The variable $post_data can also be an array which makes it easier and safer to construct. Here is a quick example

1
2
$post_data = array('name' => 'Harry', 'age' => '25');
curl_setopt ($ch, CURLOPT_POSTFIELDS, $post_data);

Now curl will automatically do the escaping of the parameters. However the above approach has a limitation that it fails for multi-dimensional arrays.

Using cookies with curl – automated logins to remote website

Cookies allow a server to store data on the client (curl program in this case) so that the client will send back the data. This is useful in things like authentication of client, storing some session data etc. Cookies just contain data in name => value pairs, much like a php array.

Being able to authenticate on a website through curl means that the curl script can actually “LOGIN” to a remote site as well. This is not difficult at all and requires just 3-4 lines of extra code. Ofcourse the username and password would have to be there in the script too. Lets take an example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//gets the data from a URL
function get_url($url)
{
    $ch = curl_init();
      
    if($ch === false)
    {
        die('Failed to create curl object');
    }
      
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
     
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
     
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
  
echo get_url('https://www.google.co.in/');

The above script fetches the url “https://www.google.co.in&#8221; and this url sends some cookies to save. The cookie specific lines are COOKIEFILE and COOKIEJAR. COOKIEJAR specifies the file where the cookie data should be saved. The COOKIEFILE is the file from which the cookie data should be read to send in the next request. In this case, both are the same.

The cookie data is saved in cookie.txt which is located in the current working directory while running the script.

# Netscape HTTP Cookie File
# http://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

.google.com	TRUE	/	FALSE	1427285380	PREF	ID=9096c8e7cd9f72f9:FF=0:TM=1364213380:LM=1364213380:S=js4r67txFLwOo3xg
#HttpOnly_.google.com	TRUE	/	FALSE	1380024580	NID	67=bJYY_PSeVt7MO2FR3AVEhFAKb-M75LpQg0yjjLmF_liHpl2LlelgazjET0hfQ7966hNIB_utS8Ve0NmQPcaENhGlhgO9ByQdEuOfBI8oGgLnQZLUSjOesDqoKI4Ywqj7
.google.co.in	TRUE	/	FALSE	1427285396	PREF	ID=8440fee987c20fa2:FF=0:TM=1364213396:LM=1364213396:S=yvcIPIyHKxKRFOYp
#HttpOnly_.google.co.in	TRUE	/	FALSE	1380024596	NID	67=UKFY159Qyt12345Xfitpo1j-GirhZ-UFyFNxyTAEEUGnNYfFNBkjjAEgBsNvHJeICUE_oLlTe9cd09O0EwdpngmZyxGhllXzZArJnQ2yB1ly1SoDe5S0gWVRt6V34MyD

If you need to save the cookie file in the same directory as the script then use the __FILE__ magic constant which has the full path to the currently executing script.

1
2
3
4
5
define('HOME' , dirname(__FILE__));
....
 
curl_setopt($ch, CURLOPT_COOKIEFILE, HOME. '/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, HOME . '/cookie.txt');

By doing the above you can ensure the location of the cookie files and dont have to hunt everywhere.

Check out an earlier post which explains how to do remote login with curl in php.

Downloading a remote file using curl

Just like the contents of a remote url can be fetched, a remote file with a given url can be downloaded and saved to local storage too.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
    Download remote file in php using curl
    Files larger that php memory will result in corrupted data
*/
$path = '/var/www/lemon.zip';
 
$ch = curl_init($url);
if($ch === false)
{
    die('Failed to create curl handle');
}
 
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
 
$data = curl_exec($ch);
file_put_contents($path, $data);
echo 'File download complete';
 
curl_close($ch);

The above program can download remote files but has few restrictions. If the download file size is larger than the total amount of memory available to php, then either a memory exceeded error would be thrown or the downloaded file would be corrupt.

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 60527991 bytes) in /var/www/curl.php on line 19

Hence the problem has to be fixed as shown in the next code example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
    Download remote file in php using curl
    chunking with fopen
*/
$path = '/var/www/lemon.zip';
 
$ch = curl_init($url);
if($ch === false)
{
    die('Failed to create curl handle');
}
 
$fp = fopen($path, 'w');
  
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FILE, $fp);
  
$data = curl_exec($ch);
  
curl_close($ch);
fclose($fp);

The above code would download large files and save them without any problem. The option CURLOPT_FILE tells curl to write the output to a file.

Using proxy with curl – anonymous browsing

Curl also supports using a proxy server to perform http requests. In this example we are going to use the TOR proxy to do anonymous browsing with CURL.

Socks proxy

The following piece of code demonstrates how curl can be configured to use a socks5 proxy (TOR in this case).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
//gets the data from a URL
function get_url($url , $proxy = false)
{
    $ch = curl_init();
     
    if($ch === false)
    {
        die('Failed to create curl object');
    }
     
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
     
    //Set the tor proxy - the tor proxy is running at localhost port 9050
    if($proxy === true)
    {
        curl_setopt($ch, CURLOPT_PROXY, 'localhost:9050');
        curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
    }
     
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
 
echo get_url('http://www.ipmango.com/' , true);

Opening the above php code in browser should load ipmango.com via the TOR proxy and ip it will show should be different from your real public ip address.

The important curl options to set are PROXY and PROXYTYPE. The first one is the address of the proxy server and the second one specifies the type of the proxy. By default the proxy type is HTTP, and has to be changed to socks.

Http proxy

Just like we used SOCKS5 proxy, we can use an http proxy as well. Here we are going to use privoxy+tor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
//gets the data from a URL
function get_url($url , $proxy = false)
{
    $ch = curl_init();
     
    if($ch === false)
    {
        die('Failed to create curl object');
    }
     
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
     
    //Set the tor proxy - privoxy is running at localhost port 8118
    if($proxy === true)
    {
        curl_setopt($ch, CURLOPT_PROXY, 'localhost:8118');
    }
     
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
 
echo get_url('http://www.ipmango.com/' , true);

Like previously this time too the program would fetch ipmango.com via the proxy. This time only the PROXY option has been set, since the PROXYTYPE option by default is http.