How To Make A URL Shortener.

·

During the recent refresh of this site, I thought it would be cool to have a custom URL shortener. My motivation was simply to reduce links from the atrocity that was http://jeremygibbs.com/yyyy/mm/dd/clever-story-title to something more elegant. I decided on making use of my company’s name and personal moniker, gibbz. So with that, I purchased the http://gib.bz domain.

Of course, more was required than simply owning a short domain name. There were three things I needed to solve: shortening a long URL, storing information about that long URL, and handling redirection to that long URL when the short URL is visited. There are two main methods to accomplish the task. The first is to point your domain name to a service like bit.ly and use their backend.1 The second method, and the one I chose, is to code your own solution. Here, I will describe my technique.2

Store

Before getting into the code, I needed to setup a database to store information about the long URLs that are shortened. The idea is to associate an auto-incremented ID with each URL and then map that to a short URL. The database only needs two fields, id and long_url. To find the ID, a search is made on long_url. Given my limited knowledge of MySQL, a full-table search is slow, and using a column index is best suited for short fields. In our case, long_url is set to 140 characters - certainly not short. To speed up the search, I reduced the searchable field to 32 characters by making an MD5 hash of long_url and storing it in a new indexed field, hash.

This is the code I used to create the MySQL table:

CREATE TABLE urls 
(
	id        INT           NOT NULL  AUTO_INCREMENT,
	long_url  VARCHAR(140)  NOT NULL,
	hash      CHAR(32)      NOT NULL,
	
	PRIMARY KEY (id),
	INDEX hash (hash)
) ENGINE=InnoDB;

Shorten

I needed a way to map a short alphanumeric string to the numeric ID in the MySQL table, but I wasn’t quite sure where to start. I found a great post on Stack Overflow that described the idea. Basically, I needed to create a bijective function such that each ID is mapped to exactly one long URL and each long URL is matched to only one ID.

First, I started by making an alphabet to map the ID against. I chose a standard base 62 character set, [a-zA-Z0-9]. In my case, I randomized this list one time so that I had a unique dictionary that made guessing the next short URL hard.3

Next, a long URL is entered. A check on the URL’s hash is made to check for existence in the database. If it doesn’t exist, the URL is entered into the database and an ID is created. The ID is then converted from base 10 to base 62. The base 62 number is then mapped to the alphabet and the result is appended to your custom domain name. The code is contained in a PHP class called urlFuncs.php, given below.

Expand

The final step is to direct browsers to the long URL when the the short URL is clicked. First, we need to grab the short URL from the browser in order to convert it back to a base 10 integer. This is accomplished using mod_rewrite.

DirectoryIndex index.php

# remove the next 3 lines if you see a 500 server error
php_flag register_globals off
php_flag magic_quotes_gpc off
php_value display_errors 1

FileETag none
ServerSignature Off

Options All -Indexes

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^shorten/(.*)$ shorten.php?long=$1 [L]
RewriteRule ^([0-9a-zA-Z]{1,6})$ shorten.php?short=$1 [L]
</IfModule>

In essence, we tell the server to intercept page requests if they match a certain pattern (in our case its the short domain plus an alphanumeric code). The alphanumeric string is grabbed by the server and sent to our url script. That script will decode the alphanumeric string to base 62 and then convert that to a base 10 integer. The MySQL table is searched for the row with an ID equal to that integer and then returns the accompanying long URL. The server then tells the browser to redirect the user to that long URL. The code is also contained in urlFuncs.php.

<?php
//-- urlFuncs.php --//

class url
{
	//-- base 62 set --//
	private $dict  = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
	private $base  = 62;
	private $site  = 'your short domain name';
		
	//-- database info --//
	private $dbName  = 'your database name';
	private $dbUser  = 'your user name';
	private $dbPass  = 'your password';
	private $dbHost  = 'localhost';
	private $dbTable = 'urls';
	
	//-- initialize --//
	function __construct()
	{
		//-- connect to database --//
		mysql_connect($this->dbHost, $this->dbUser, $this->dbPass);
		mysql_select_db($this->dbName);
	}
	//-- function to shorten the long url --//
	function shorten($url)
	{			
		//-- make a hash of the long url to improve mysql searching --//
		$hash  = md5($url);
		
		//-- shortened alphanumeric variable --//
		$short = NULL;
		
		//-- does the long url already exist in the database? --//
		$url_does_exist = mysql_query("SELECT id FROM urls WHERE hash='$hash'");
		if (mysql_num_rows($url_does_exist)) 
		{
			$row = mysql_fetch_object($url_does_exist);
			$id = $row->id;
		}
		//-- if not, insert the new long url into the database --// 
		else
		{
			mysql_query("INSERT INTO urls (long_url, hash) VALUES ('". mysql_real_escape_string($url) ."', '$hash')");
			$id = mysql_insert_id();
		}
	
		//-- convert the mysql id of the long url from base 10 to base 62 and make alphanumeric --//
		do 
		{
			$short = $this->dict[($id%$this->base)].$short;
		} 
		while ($id = floor($id/$this->base));
		
		//-- return the shortened url --//
		$site = $this->site;
		return "$site/$short";
	}
	//-- function to expand the short url --//	
	function expand($url)
	{	
		//-- decode the short string to base 62 and convert to base 10 mysql id --//
		$id = 0;
		while($len = strlen($url)) 
		{
			$id += strpos($this->dict, $url[0]);
			$id *= $len > 1 ? $this->base : 1;
			$url = substr($url, 1);
		}
		
		//-- make sure long url pertaining to the decoded mysql id exists --//
		$url_does_exist = mysql_query("SELECT long_url FROM urls WHERE id=$id");
			
		//-- if it does, return the long url --//
		if (mysql_num_rows($url_does_exist)) 
		{
			$row = mysql_fetch_object($url_does_exist);
				return $row->long_url;
		} 
		else
		{
			return FALSE;
		}
	}
}	

?>

Accessing URLs

To access either the shortened or expanded URLs, we need to create a script that calls our PHP class functions contained in urlFuncs.php.

<?php
//-- shorten.php --//

//-- url class --//
require_once('urlFuncs.php');
$url = new url;

//-- for shortening of long urls --//
$long = get_magic_quotes_gpc() ? stripslashes(trim($_REQUEST['long'])) : trim($_REQUEST['long']);

if(!empty($long) && preg_match('|^https?://|', $long))
{
	echo $url->shorten($long);
}

//-- for redirection of short urls --//
$short = $_GET['short'];
if($short)
{
	$long = $url->expand($short);
	header('HTTP/1.1 301 Moved Permanently');
	if ($long)
	{
		header('Location: ' .  $long);
	}
	else 
	{
		header('Location: http://yourblogname.com');
	}
	exit;
}

?>

To retrieve a short URL via web browser, you would simply enter:

http://yourdoma.in/shorten.php?long=http://thelongdomain.com/blah/blah/blah

To retrieve it programmatically, type:

<?php
$url = urlencode('http://thelongdomain.com/blah/blah/blah');
$short = file_get_contents('http://yourdoma.in/shorten.php?long=' . $url);
echo $short;
?>

You’ll notice we added a line in the above mod_rewrite rules to handle this request.

If you enter the short domain into a browser, you should quickly be redirected to the long URL you entered.

As a final note, you generally won’t want people using your short domain for anything other than redirection. That is to say, you don’t want people going to yourdoma.in. To prevent that, simply enter the following into the root index.php:

<?php
header('HTTP/1.1 301 Moved Permanently');
header('Location:http://yourblogname.com');
?>  

Summary

I have shown you a way to create a custom URL shortener in PHP. To shorten a lengthy URL, that address is hashed and inserted into a database. The resulting ID is converted to a base 62 integer and mapped to a custom dictionary that creates a short alphanumeric code. That short code is appended to your custom short domain. You can request the short URL via web browser or programmatically.

When the short URL is visited, an Apache rule is used to intercept the request. The short code is sent to a script for expansion, decoded to base 62, and then converted to a base 10 integer. The MySQL row with an ID matching that integer is found and the long URL is returned, to which the browser is redirected.

There may be better ways to accomplish the task, but in practice this works great for me. More importantly, it is fast. More additions are possible, including the ability to track hits and referrers to each short URL. If you have any questions or enjoy this tutorial, hit me up on Twitter.


  1. This is probably the easiest method for most people, but I wasn’t interested in easy. I also wasn’t interested in another company holding data relating to my site’s traffic. ↩︎

  2. Note: I am not a PHP/MySQL expert, so please don’t waste time lamenting the inefficiency of my code. ↩︎

  3. In practice, it probably doesn’t matter and you can stick with the default list. ↩︎


Dream Unrealized.

·

Today, we in the United States celebrate the birthday of Rev. Martin Luther King, Jr. Generally this means that Facebook and Twitter are filled with quotes and platitudes espousing Dr. King’s convictions on race and equality. Many take the time to pat the US on its back for overcoming the evils of racism. Before pulling a muscle with such congratulations, I encourage you to consider a few points.

This country has an embarrassing record towards its fellow man: Native Americans, women, Japanese, Africans, Mexicans, Muslims, homosexuals, and so many more. Think about why you quote Dr. King today. Less than 44 years ago, he was murdered for so brazenly suggesting during his lifetime that he and fellow African-Americans be allowed to urinate in the same restroom, drink from the same fountain, attend the same schools, or receive equal pay for equal work as those who were white. Again, less than 44 years ago in this United States, a man was the target of violence for the mere idea that skin pigment should not differentiate the rights of men.

I know what some are thinking, “Jeremy, that is all true but we have overcome such behavior and are leading this country into a future of enlightenment.” Are we really?

In 1998, James Byrd, Jr. was dragged to death in a racially inspired crime. Nooses and race were again married in 2006 and 2007. In 2011, a Cincinnati landlord posted a “whites only” sign at an apartment complex pool. Broader problems still exist. Racial discrepancies in education no doubt lead to discrepancies in income and crime.

“Yes, Jeremy, but these areas are improving. These things take time. We have learned our lesson.” Have we really?

While these areas are certainly improving, consider again our country’s history. Native Americans were marginalized, discriminated against, and killed for being different. We “learned our lesson.” Asian-Americans were marginalized, discriminated against, and killed for being different. We “learned our lesson.” Jews were marginalized, discriminated against, and killed for being different. We “learned our lesson.” European-Americans were marginalized, discriminated against, and killed for being different. We “learned our lesson.” African-Americans were marginalized, discriminated against, and killed for being different. We “learned our lesson.” Latin-Americans were marginalized, discriminated against, and killed for being different. We “learned our lesson.” Now homosexual- and Muslim-Americans are being marginalized, discriminated against, and killed for being different.

See the pattern? The US has routinely used religion and nationalism out of context in order to justify its actions against those who are different. In each case, we proclaim our understanding of why it was wrong and vow to move past such behavior. As recent as this year, such proclamations have proven empty.

I write this not to demean the US or belittle the ideas of people like Dr. King. It is my hope that instead of quote-grabbing one day a year to advocate ideas that are otherwise largely ignored, people will expand the ideas into a plan to truly change our society. In fact, that is exactly why Dr. King should be quoted. He didn’t just have a dream, he had a plan. That is why he died and that is why you should quote him. Until then, his dream will remain as it exists currently - unrealized.


Weather Is A Tough Gig

· ·

Posted without comment.

The next morning, Bolaris woke up alone with no memory of the previous night and the painting nowhere to be found. He returned to Philadelphia, only to discover he had purchased “bottles of champagne every 15 minutes or so,” including a $2,500 bottle of Cristal Vintage and a $3,120 bottle of Dom Perignon. Additionally, he had used his American Express card to buy a $2,000 tin of caviar. The painting? $2,500 plus a $500 tip. The grand total came to $43,712.25, with AmEx refusing to reverse the charges.


Students Fold Toilet Paper 13 Times

· ·

It may look like a prank but these mathematics students from St. Mark’s School in Southborough, Massachusetts aren’t toilet papering the famed infinite corridor at MIT. Using intricate choreography and brute force, they’re breaking a paper-folding record by completing 13 folds, a challenge that students at the school have been tackling for seven years with the help of teacher James Tanton.

[…]

The final result was a 1.5-metre wide and 76-centimetre high wad comprising 8192 layers of paper.

I can’t think of a better way to improve society and expand our knowledge than by spending seven years figuring out how to fold 1.2 kilometers of toilet paper thirteen times.


Believing In Tim Tebow

· ·

Rick Reilly, ESPN.com:

I’ve come to believe in Tim Tebow, but not for what he does on a football field, which is still three parts Dr. Jekyll and two parts Mr. Hyde.

No, I’ve come to believe in Tim Tebow for what he does off a football field, which is represent the best parts of us, the parts I want to be and so rarely am.

Who among us is this selfless?

Every week, Tebow picks out someone who is suffering, or who is dying, or who is injured. He flies these people and their families to the Broncos game, rents them a car, puts them up in a nice hotel, buys them dinner (usually at a Dave & Buster’s), gets them and their families pregame passes, visits with them just before kickoff (!), gets them 30-yard-line tickets down low, visits with them after the game (sometimes for an hour), has them walk him to his car, and sends them off with a basket of gifts.

Home or road, win or lose, hero or goat.

Speaking of Tim Tebow, this is a great story from Rick Reilly. Put aside your pre-conceived ideas or feelings, forget who the story is about, and instead focus on the acts. After doing that, I challenge you to hate Tim Tebow and not admit he’s a pretty good guy.


Tebow 160

· ·

The last time ESPN tried the stunt of dedicating an entire hour of SportsCenter to Tim Tebow, they managed a paltry 88 mentions of his name. The WWL went above and beyond in its 11 a.m. Eastern show, nearly doubling the instances in which they aired the Denver quarterback’s name. For 48 minutes of programming, that works out to one “Tebow” every 18 seconds.

I’m actually becoming a Tebow fan, both on and off the field. However, it isn’t hard to see why a lot of fans don’t share my feelings. Hopefully this story will remind you where to direct your hate.


Geek vs Nerd

· ·

Awesome infographic that examines the qualities of geek versus nerd.

In the ongoing battle between geek and nerd, one must take sides, but how can this be done without a solid argument for both personas? We here at Masters In IT (a mix of nerds and geeks) decided that it’s time to lay all the cards on the table to determine which is better and answer the question some fear to know: Are you a geek, or a nerd?

I’d like to think I am a geek with nerd tendencies.


How To Nap

· ·

FOR YEARS, NAPS have gotten a bad rap, derided as a sign of laziness, weakness, or senility. We are “caught” napping or “found asleep at the switch.”

But lately napping has garnered new respect, thanks to a solid scientific evidence that midday dozing benefits both mental acuity and overall health. A slew of new studies have shown that naps boost alertness, creativity, mood, and productivity in the later hours of the day.

That’s all the permission I need.

This is a pretty cool overview of why humans nap and, more importantly, how to do it correctly. It is also clear that I am an owl and not a lark.


Oklahoma City To Launch Spokies

· ·

Bicycle lanes included in downtown Oklahoma City’s Project 180 could get a workout in short order by users of a shared-bike program called Spokies.

The program should launch this spring, Jennifer Gooden, the city’s sustainability director, told the Oklahoma City Council on Tuesday. Federal grant money paid for 95 bicycles to be placed at six stations downtown.

This is great news for downtown. I’m loving the continual positive transformation of Oklahoma City.


Mike Stoops To Rejoin The Sooners

· ·

Joe Schad, ESPN:

Former Arizona coach Mike Stoops will join the Oklahoma staff – and his brother – as co-defensive coordinator, a move that will be announced as soon as Wednesday, sources said.

Willie Martinez, an assistant who coached the defensive backs, is leaving the staff.

Count me as excited. Not convinced? Now? That’s it.