The explanation
So this is probably the most SQL I’ve written in, well, probably ever. It is certainly the first stored procedure I’ve written and stored function as well. It may not be what your DBA would write, but I’m not your DBA.
Anyway, the goal was to be able to create codes for URLs that would be as short as possible, while maintaining sequence and being completely unique. When disseminating the URL in places where there are character limitations (think twitter) every character counts. So we want our codes to be as short as possible for as long as possible.
I usually use a numeric database id column to pass around in URLs in order to pull up a certain record on a given page. So say I was using grails, I would have a URL like http://host.com/activity/1 which would map to the record with an id of 1 in the activity table. The problem with this approach is that by the time you hit 10 records you have added another character. If you hit a very low number of 100,000 records you have added 5 characters. In essence you only have 10 permutations per column. If we want to keep short urls for a long time we need a different counting system. Hex is nice, but still too short so I went with a base-62 system. The sequence is 0-9A-Za-z. This way each column can represent 62 unique records, a two character code can represent 3,844 records, a three character code could represent 238,328 records, etc.
So, that’s the idea. Fairly simple, but pretty powerful in context. Here’s the implementation as a stored procedure and a stored function. BTW, this was written against MySQL. I have no idea whether or not it would be valid in another DB.
The code
-
DELIMITER $$
-
-
DROP PROCEDURE IF EXISTS `next_activity_code`$$
-
-
CREATE PROCEDURE `next_activity_code`()
-
BEGIN
-
Declare ascii_code int(5);
-
declare new_code varchar(10);
-
declare current_code varchar(10);
-
Declare current_code_length int(10);
-
-
SELECT id AS `code` FROM `code` INTO current_code;
-
SET current_code_length = character_length(current_code);
-
SET ascii_code = ord(substring(current_code, -1));
-
case ascii_code
-
when 122 then
-
begin
-
declare maxed bool DEFAULT 1;
-
declare current_ascii_code int;
-
Declare counter int DEFAULT current_code_length - 1;
-
-
find_unmaxed_column: while counter > 0 do
-
SET current_ascii_code = ord(substring(current_code, counter, 1));
-
IF current_ascii_code != 122 then
-
SET maxed = 0;
-
SELECT next_char(current_ascii_code) INTO new_code;
-
SET new_code = concat(substring(current_code, counter - 1, 1), new_code, repeat(char(48), current_code_length - counter));
-
leave find_unmaxed_column;
-
end IF;
-
SET counter = counter - 1;
-
end while;
-
-
IF maxed then
-
SET new_code = repeat(char(48), current_code_length + 1);
-
end IF;
-
end;
-
else
-
begin
-
SELECT next_char(ascii_code) INTO new_code;
-
SET new_code = concat(substring(current_code, 1, current_code_length - 1), new_code);
-
end;
-
end case;
-
-
UPDATE `code` SET id = new_code WHERE id = current_code;
-
SELECT new_code;
-
END$$
-
-
DELIMITER ;
-
-
-
DELIMITER $$
-
-
DROP FUNCTION IF EXISTS `twitrunner`.`next_char`$$
-
-
CREATE FUNCTION `next_char`(current_code int) RETURNS varchar(1) DETERMINISTIC
-
BEGIN
-
case current_code
-
when 57 then
-
begin
-
RETURN char(65);
-
end;
-
when 90 then
-
begin
-
RETURN char(97);
-
end;
-
when 122 then
-
begin
-
RETURN NULL;
-
end;
-
else
-
begin
-
RETURN char(current_code + 1);
-
end;
-
End Case;
-
END$$
-
-
DELIMITER ;
-
The breakdown
The “current code” is kept in a table with one column and one row. We look up the code from the table in order to generate the next code in the sequence. However, since we have a character sequence and not a numerical sequence, we can’t just do an increment and be done. In order to be able to run some math on the code we break down parts of the code into it’s ascii integer code equivalent when needed. The ranges that we chose for our character sequence as [character:ascii] are [0-9:48-57], [A-Z:65-90], and [a-z:97-122].
With that knowledge the rest of the logic should be pretty straight forward. The basic flow is to check the last character to see if it can be incremented. If it can we increment it. If it cannot be incremented (i.e. it’s the 62nd character in our sequence), then we loop backwards through the code until we find a character that can be incremented. If we find a character that can be incremented, it is incremented and then every character following it in the code is reset to the first character in the sequence. If we don’t find a character that can be incremented it means that every column has been maxed out and we need to start a new one. In this case all columns are reset to the first character in the sequence and a new character is added to the end of the code. Like I said earlier, pretty simple, but also powerful.
Hopefully this was useful. If not, I hope it was at least entertaining ;).