The Joomla database schema smells
Back in 2006 I was a member of the standards and guidelines workgroup in Joomla, and during my involvement there I produced the Joomla 1.5 database schema as a DB Designer EER diagram, so people could view the database in a visual way. I recently also set up the Joomla 1.6 database schema, this time using MySQL Workbench which is a better tool for the job, and also GPL and available on multiple platforms. During my work with setting up the 1.6 schema, I came across lots of oddities and lacks of normalization, and a severe lack of naming conventions and guidelines became obvious.
Improvement suggestions
I have listed a few of these below, and please add comments to this post if you find some of your own or you disagree with me, and I’ll adjust this list accordingly.
Naming conventions
I primarily want to address the lack of standards in this schema, which I even reported back as early as in 2006 when creating the 1.5 schema. In coding, the concept of convention over configuration gives predictability, simplicity and reduces the amount code you have to write later on. Same goes for database schemas and Joomla: The better the schemas follow the naming conventions, the less PHP code you’ll have to write later on. Convention also eliminates ambiguity, and developers won’t be tempted into guessing. Just having a convention for column names in the database would be a nice first step, both for Joomla and 3rd party developers alike. Right now, column names for instance are a sweet mix of abbreviations and non-abbreviations, both underscored and camelCased names. Plus, the column id is used to name most INT primary key names but not all.
A prime example of Joomla’s lack of conventions can be seen in the content table, where we have these five foreign key columns: catid, sectionid, language, checked_out and asset_id. The first is abbreviated with no underscore, the second is also without underscore but not abbreviated, the third does not look like a foreign key because it has no “id” in it’s name like the others, but it holds the language code for the article, and thus references language.lang_code (Who would have known?). The fourth contains has no “id”-hint, and no hint to the fact that it refers to users.id. Finally, the fifth column is not abbreviated but with underscores (the way I like it personally). Five different naming conventions in one single table, if we should at all should acknowledge “checked_out” as some sort of readable standard foreign key for user.id.
Also check out and compare content.created_by and categories.created_user_id. They both reference the same column. Weird huh? See the lack of convention? Which format do you prefer?
Table names
For consistency, I would propose changing using plural names on tables. Just a little something I got used to while using Nooku. Most tables are already pluralised, and tables normally contain multiple rows, so it makes most sense to use pluralisation as the standard and thus renaming the following tables:
- content renamed to contents
- content_frontpage renamed to contents_frontpage
- content_rating renamed to contents_ratings
- menu renamed to menus
- modules_menu renamed to modules_menus
- session renamed to sessions
Primary keys
Primary keys are important, as they are used for row identification, and will therefore play a big part in the PHP code we write for our applications and extensions.
Names
Most tables have a primary key column named “id”, but not all tables follow this rule. The languages table has lang_id, update_sites has update_site_id, update_categories has categoryid, etc. In other words: Lots of inconsistensies and no clear naming convention.
Nooku is a good example of a framework that imposes a convention which allows the PHP-code to interpret the database structure automatically, and only requires you to write code for those few special cases that don’t fit the blueprint at all. Nooku uses the following convention for primary key names: componentname_tablename_id. For the articles table in com_content, the Nooku table name would have been content_articles, and the primary key would have been content_article_id. Notice the plural formed table name and the singular column name. Logical, since the table represents a collection of items, and the column identifies a single one.
Next, I would wish for the foreign keys to also use the full name of the column they are referencing. So instead of those abbreviated catid columns all over the place, you use for instance category_id.
Field types
A lot of the tables have an auto increment INT type of primary key, which is to be expected, but there is no clear standard as to what datatype this field should be. Some are unsigned, some are not, some are INT, some are INT(10) and some INT(11). There is no reason why these should not just follow a common standard.
The INT key sickness
Very often you will find yourself using a primary key column for your table named something_id, and it will have the field type in the shape of a large integer. This is typical for relational databases, so don’t be alarmed, but don’t stop thinking either. The primary key in a table is something that can uniquely identify each and every record in that table. If you already have such a column, then there is no need to add another integer.
Consider these examples from the Joomla database:
The language table has a lang_id INT(11) column. This really seems redundant, as the lang_code field, CHAR(7), uniquely identifies the language just fine. In the menu table, and all other tables using multilingual features, you have the column language which is also a CHAR(7), and references lang_code. It’s weird, and bad code, to use a foreign key to reference a column that’s not a primary key, and not even unique. It’s also hard to see the connection from menu.language to languages.lang_code too. As before: Consistent naming would really help; why not call it menu.lang_code instead?
Similarly, in the redirect links table, the old_url-field is what really should identify redirect links. You can’t have multiple redirects for one old url, hence every row in this table is uniquely identified by it’s old_url column, so that should be the table redirect_links’ primary key. Consequently, the id INT field seems redundant here too.
An oddity: ACL
Viewlevels are in the database connected to usergroups via JSON-encoded data in the column viewlevel.rules. The column name is misleading, but it is even more strange to see foreign keys stacked up in a pure JSON-encoded field like this, instead of for instance creating a table for usergroups_viewlevels, to connect the usergroups and viewlevels tables, similar to for instance modules_menu, which connect modules to menu items.
There is probably an explenation here, and it probably got something to do with performance, pragmatism or laziness. I just think it’s bad database design, that’s all. Maybe someone out there, who has been deep into the ACL system, can explain this to me?
Also strange: Extension update sites
This part of the database seems incomplete and unfinished, even though this schema is supposed to be a beta. For instance the update_sites_extensions table allows null values in it’s two fields extension_id and update_site_id, and a unique key is nowhere to be found in the table. This might just not be a feature due for Joomla 1.6, but then I would not see the reason for introducing these tables in the first place.
Other things
There are so many other things, but I can’t be bothered to write them all down. I actually started writing a section in this article about recommendations for normalization, changes in column names, types and usages, but the list already had six elements before I was finished going through two of the ten or twelve large tables in this schema.
This database schema suffers because of legacy issues, and a parts of it probably dates back even as far as the Mambo days. You can really see the same things happening in the user interface. Or should I say not happening? Instead of being a constantly renewed, improved and refactored CMS and framework, Joomla bears both the visual and codewise smell of patchwork and old legacy code.
Because of the obvious lack of progress in the Joomla project, I fear that this will be the last database schema for Joomla I will make; The web world is moving forward, but Joomla seems to be stuck. We strive to provide our clients and ourselves with the best tools of the trade, and that’s what I’ll keep doing. I hope I am proven wrong.

Pingback: Torkil Johnsen » Joomla 1.6 database schema
Pingback: Database schema di Joomla 1.6
Pingback: Torkil Johnsen » What is this Nooku Framework thing really? And why should I care?
Pingback: Nooku Framework, why should you care ? « Nooku Blog
Pingback: O modelo de dados do Joomla | SWX Softwares - Desenvolvimento e Design Web