How To Make A URL Shortener.

·

During the recent refresh of this site, I thought it would be cool to have a custom URL shortener. My motivation was simply to reduce links from the atrocity that was http://jeremygibbs.com/yyyy/mm/dd/clever-story-title to something more elegant. I decided on making use of my company’s name and personal moniker, gibbz. So with that, I purchased the http://gib.bz domain.

Of course, more was required than simply owning a short domain name. There were three things I needed to solve: shortening a long URL, storing information about that long URL, and handling redirection to that long URL when the short URL is visited. There are two main methods to accomplish the task. The first is to point your domain name to a service like bit.ly and use their backend.1 The second method, and the one I chose, is to code your own solution. Here, I will describe my technique.2

Store

Before getting into the code, I needed to setup a database to store information about the long URLs that are shortened. The idea is to associate an auto-incremented ID with each URL and then map that to a short URL. The database only needs two fields, id and long_url. To find the ID, a search is made on long_url. Given my limited knowledge of MySQL, a full-table search is slow, and using a column index is best suited for short fields. In our case, long_url is set to 140 characters - certainly not short. To speed up the search, I reduced the searchable field to 32 characters by making an MD5 hash of long_url and storing it in a new indexed field, hash.

This is the code I used to create the MySQL table:

CREATE TABLE urls 
(
	id        INT           NOT NULL  AUTO_INCREMENT,
	long_url  VARCHAR(140)  NOT NULL,
	hash      CHAR(32)      NOT NULL,
	
	PRIMARY KEY (id),
	INDEX hash (hash)
) ENGINE=InnoDB;

Shorten

I needed a way to map a short alphanumeric string to the numeric ID in the MySQL table, but I wasn’t quite sure where to start. I found a great post on Stack Overflow that described the idea. Basically, I needed to create a bijective function such that each ID is mapped to exactly one long URL and each long URL is matched to only one ID.

First, I started by making an alphabet to map the ID against. I chose a standard base 62 character set, [a-zA-Z0-9]. In my case, I randomized this list one time so that I had a unique dictionary that made guessing the next short URL hard.3

Next, a long URL is entered. A check on the URL’s hash is made to check for existence in the database. If it doesn’t exist, the URL is entered into the database and an ID is created. The ID is then converted from base 10 to base 62. The base 62 number is then mapped to the alphabet and the result is appended to your custom domain name. The code is contained in a PHP class called urlFuncs.php, given below.

Expand

The final step is to direct browsers to the long URL when the the short URL is clicked. First, we need to grab the short URL from the browser in order to convert it back to a base 10 integer. This is accomplished using mod_rewrite.

DirectoryIndex index.php

# remove the next 3 lines if you see a 500 server error
php_flag register_globals off
php_flag magic_quotes_gpc off
php_value display_errors 1

FileETag none
ServerSignature Off

Options All -Indexes

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^shorten/(.*)$ shorten.php?long=$1 [L]
RewriteRule ^([0-9a-zA-Z]{1,6})$ shorten.php?short=$1 [L]
</IfModule>

In essence, we tell the server to intercept page requests if they match a certain pattern (in our case its the short domain plus an alphanumeric code). The alphanumeric string is grabbed by the server and sent to our url script. That script will decode the alphanumeric string to base 62 and then convert that to a base 10 integer. The MySQL table is searched for the row with an ID equal to that integer and then returns the accompanying long URL. The server then tells the browser to redirect the user to that long URL. The code is also contained in urlFuncs.php.

<?php
//-- urlFuncs.php --//

class url
{
	//-- base 62 set --//
	private $dict  = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
	private $base  = 62;
	private $site  = 'your short domain name';
		
	//-- database info --//
	private $dbName  = 'your database name';
	private $dbUser  = 'your user name';
	private $dbPass  = 'your password';
	private $dbHost  = 'localhost';
	private $dbTable = 'urls';
	
	//-- initialize --//
	function __construct()
	{
		//-- connect to database --//
		mysql_connect($this->dbHost, $this->dbUser, $this->dbPass);
		mysql_select_db($this->dbName);
	}
	//-- function to shorten the long url --//
	function shorten($url)
	{			
		//-- make a hash of the long url to improve mysql searching --//
		$hash  = md5($url);
		
		//-- shortened alphanumeric variable --//
		$short = NULL;
		
		//-- does the long url already exist in the database? --//
		$url_does_exist = mysql_query("SELECT id FROM urls WHERE hash='$hash'");
		if (mysql_num_rows($url_does_exist)) 
		{
			$row = mysql_fetch_object($url_does_exist);
			$id = $row->id;
		}
		//-- if not, insert the new long url into the database --// 
		else
		{
			mysql_query("INSERT INTO urls (long_url, hash) VALUES ('". mysql_real_escape_string($url) ."', '$hash')");
			$id = mysql_insert_id();
		}
	
		//-- convert the mysql id of the long url from base 10 to base 62 and make alphanumeric --//
		do 
		{
			$short = $this->dict[($id%$this->base)].$short;
		} 
		while ($id = floor($id/$this->base));
		
		//-- return the shortened url --//
		$site = $this->site;
		return "$site/$short";
	}
	//-- function to expand the short url --//	
	function expand($url)
	{	
		//-- decode the short string to base 62 and convert to base 10 mysql id --//
		$id = 0;
		while($len = strlen($url)) 
		{
			$id += strpos($this->dict, $url[0]);
			$id *= $len > 1 ? $this->base : 1;
			$url = substr($url, 1);
		}
		
		//-- make sure long url pertaining to the decoded mysql id exists --//
		$url_does_exist = mysql_query("SELECT long_url FROM urls WHERE id=$id");
			
		//-- if it does, return the long url --//
		if (mysql_num_rows($url_does_exist)) 
		{
			$row = mysql_fetch_object($url_does_exist);
				return $row->long_url;
		} 
		else
		{
			return FALSE;
		}
	}
}	

?>

Accessing URLs

To access either the shortened or expanded URLs, we need to create a script that calls our PHP class functions contained in urlFuncs.php.

<?php
//-- shorten.php --//

//-- url class --//
require_once('urlFuncs.php');
$url = new url;

//-- for shortening of long urls --//
$long = get_magic_quotes_gpc() ? stripslashes(trim($_REQUEST['long'])) : trim($_REQUEST['long']);

if(!empty($long) && preg_match('|^https?://|', $long))
{
	echo $url->shorten($long);
}

//-- for redirection of short urls --//
$short = $_GET['short'];
if($short)
{
	$long = $url->expand($short);
	header('HTTP/1.1 301 Moved Permanently');
	if ($long)
	{
		header('Location: ' .  $long);
	}
	else 
	{
		header('Location: http://yourblogname.com');
	}
	exit;
}

?>

To retrieve a short URL via web browser, you would simply enter:

http://yourdoma.in/shorten.php?long=http://thelongdomain.com/blah/blah/blah

To retrieve it programmatically, type:

<?php
$url = urlencode('http://thelongdomain.com/blah/blah/blah');
$short = file_get_contents('http://yourdoma.in/shorten.php?long=' . $url);
echo $short;
?>

You’ll notice we added a line in the above mod_rewrite rules to handle this request.

If you enter the short domain into a browser, you should quickly be redirected to the long URL you entered.

As a final note, you generally won’t want people using your short domain for anything other than redirection. That is to say, you don’t want people going to yourdoma.in. To prevent that, simply enter the following into the root index.php:

<?php
header('HTTP/1.1 301 Moved Permanently');
header('Location:http://yourblogname.com');
?>  

Summary

I have shown you a way to create a custom URL shortener in PHP. To shorten a lengthy URL, that address is hashed and inserted into a database. The resulting ID is converted to a base 62 integer and mapped to a custom dictionary that creates a short alphanumeric code. That short code is appended to your custom short domain. You can request the short URL via web browser or programmatically.

When the short URL is visited, an Apache rule is used to intercept the request. The short code is sent to a script for expansion, decoded to base 62, and then converted to a base 10 integer. The MySQL row with an ID matching that integer is found and the long URL is returned, to which the browser is redirected.

There may be better ways to accomplish the task, but in practice this works great for me. More importantly, it is fast. More additions are possible, including the ability to track hits and referrers to each short URL. If you have any questions or enjoy this tutorial, hit me up on Twitter.


  1. This is probably the easiest method for most people, but I wasn’t interested in easy. I also wasn’t interested in another company holding data relating to my site’s traffic. ↩︎

  2. Note: I am not a PHP/MySQL expert, so please don’t waste time lamenting the inefficiency of my code. ↩︎

  3. In practice, it probably doesn’t matter and you can stick with the default list. ↩︎