Torkil Johnsen

My personal piece of cyberspace

The Joomla database schema smells

Posted in Databases,joomla on Jul 12th, 2010

Back in 2006 I was a member of the standards and guidelines workgroup in Joomla, and during my involvement there I produced the Joomla 1.5 database schema as a DB Designer EER diagram, so people could view the database in a visual way. I recently also set up the Joomla 1.6 database schema, this time using MySQL Workbench which is a better tool for the job, and also GPL and available on multiple platforms. During my work with setting up the 1.6 schema, I came across lots of oddities and lacks of normalization, and a severe lack of naming conventions and guidelines became obvious.

Improvement suggestions

I have listed a few of these below, and please add comments to this post if you find some of your own or you disagree with me, and I’ll adjust this list accordingly.

Naming conventions

I primarily want to address the lack of standards in this schema, which I even reported back as early as in 2006 when creating the 1.5 schema. In coding, the concept of convention over configuration gives predictability, simplicity and reduces the amount code you have to write later on. Same goes for database schemas and Joomla: The better the schemas follow the naming conventions, the less PHP code you’ll have to write later on. Convention also eliminates ambiguity, and developers won’t be tempted into guessing. Just having a convention for column names in the database would be a nice first step, both for Joomla and 3rd party developers alike. Right now, column names for instance are a sweet mix of abbreviations and non-abbreviations, both underscored and camelCased names. Plus, the column id is used to name most INT primary key names but not all.

A prime example of Joomla’s lack of conventions can be seen in the content table, where we have these five foreign key columns: catid, sectionid, language, checked_out and asset_id. The first is abbreviated with no underscore, the second is also without underscore but not abbreviated, the third does not look like a foreign key because it has no “id” in it’s name like the others, but it holds the language code for the article, and thus references language.lang_code (Who would have known?). The fourth contains has no “id”-hint, and no hint to the fact that it refers to users.id. Finally, the fifth column is not abbreviated but with underscores (the way I like it personally). Five different naming conventions in one single table, if we should at all should acknowledge “checked_out” as some sort of readable standard foreign key for user.id.

Also check out and compare content.created_by and categories.created_user_id. They both reference the same column. Weird huh? See the lack of convention? Which format do you prefer?

Table names

For consistency, I would propose changing using plural names on tables. Just a little something I got used to while using Nooku. Most tables are already pluralised, and tables normally contain multiple rows, so it makes most sense to use pluralisation as the standard and thus renaming the following tables:

  1. content renamed to contents
  2. content_frontpage renamed to contents_frontpage
  3. content_rating renamed to contents_ratings
  4. menu renamed to menus
  5. modules_menu renamed to modules_menus
  6. session renamed to sessions

Primary keys

Primary keys are important, as they are used for row identification, and will therefore play a big part in the PHP code we write for our applications and extensions.

Names

Most tables have a primary key column named “id”, but not all tables follow this rule. The languages table has lang_id, update_sites has update_site_id, update_categories has categoryid, etc. In other words: Lots of inconsistensies and no clear naming convention.

Nooku is a good example of a framework that imposes a convention which allows the PHP-code to interpret the database structure automatically, and only requires you to write code for those few special cases that don’t fit the blueprint at all. Nooku uses the following convention for primary key names: componentname_tablename_id. For the articles table in com_content, the Nooku table name would have been content_articles, and the primary key would have been content_article_id. Notice the plural formed table name and the singular column name. Logical, since the table represents a collection of items, and the column identifies a single one.

Next, I would wish for the foreign keys to also use the full name of the column they are referencing. So instead of those abbreviated catid columns all over the place, you use for instance category_id.

Field types

A lot of the tables have an auto increment INT type of primary key, which is to be expected, but there is no clear standard as to what datatype this field should be. Some are unsigned, some are not, some are INT, some are INT(10) and some INT(11). There is no reason why these should not just follow a common standard.

The INT key sickness

Very often you will find yourself using a primary key column for your table named something_id, and it will have the field type in the shape of a large integer. This is typical for relational databases, so don’t be alarmed, but don’t stop thinking either. The primary key in a table is something that can uniquely identify each and every record in that table. If you already have such a column, then there is no need to add another integer.

Consider these examples from the Joomla database:

The language table has a lang_id INT(11) column. This really seems redundant, as the lang_code field, CHAR(7), uniquely identifies the language just fine. In the menu table, and all other tables using multilingual features, you have the column language which is also a CHAR(7), and references lang_code. It’s weird, and bad code, to use a foreign key to reference a column that’s not a primary key, and not even unique. It’s also hard to see the connection from menu.language to languages.lang_code too. As before: Consistent naming would really help; why not call it menu.lang_code instead?

Similarly, in the redirect links table, the old_url-field is what really should identify redirect links. You can’t have multiple redirects for one old url, hence every row in this table is uniquely identified by it’s old_url column, so that should be the table redirect_links’ primary key. Consequently, the id INT field seems redundant here too.

An oddity: ACL

Viewlevels are in the database connected to usergroups via JSON-encoded data in the column viewlevel.rules. The column name is misleading, but it is even more strange to see foreign keys stacked up in a pure JSON-encoded field like this, instead of for instance creating a table for usergroups_viewlevels, to connect the usergroups and viewlevels tables, similar to for instance modules_menu, which connect modules to menu items.

There is probably an explenation here, and it probably got something to do with performance, pragmatism or laziness. I just think it’s bad database design, that’s all. Maybe someone out there, who has been deep into the ACL system, can explain this to me?

Also strange: Extension update sites

This part of the database seems incomplete and unfinished, even though this schema is supposed to be a beta. For instance the update_sites_extensions table allows null values in it’s two fields extension_id and update_site_id, and a unique key is nowhere to be found in the table. This might just not be a feature due for Joomla 1.6, but then I would not see the reason for introducing these tables in the first place.

Other things

There are so many other things, but I can’t be bothered to write them all down. I actually started writing a section in this article about recommendations for normalization, changes in column names, types and usages, but the list already had six elements before I was finished going through two of the ten or twelve large tables in this schema.

This database schema suffers because of legacy issues, and a parts of it probably dates back even as far as the Mambo days. You can really see the same things happening in the user interface. Or should I say not happening? Instead of being a constantly renewed, improved and refactored CMS and framework, Joomla bears both the visual and codewise smell of patchwork and old legacy code.

Because of the obvious lack of progress in the Joomla project, I fear that this will be the last database schema for Joomla I will make; The web world is moving forward, but Joomla seems to be stuck. We strive to provide our clients and ourselves with the best tools of the trade, and that’s what I’ll keep doing. I hope I am proven wrong.

  • Pingback: Torkil Johnsen » Joomla 1.6 database schema

  • Pingback: Database schema di Joomla 1.6

  • http://magicthemes.com Israel D. Canasa

    Torkil, I don’t wanna bite the hand that feeds me but I really agree with you that Joomla needs to clean itself up. I think it has something to do with the leadership too. People may get mad at me for saying this, but we need fresh minds in the Joomla Core Dev. Joomla 1.7 should be scrapped and go straight to Joomla 2.0, a total rewrite that should disregard legacy issues.

  • http://magicthemes.com Israel D. Canasa

    Torkil, I don’t wanna bite the hand that feeds me but I really agree with you that Joomla needs to clean itself up. I think it has something to do with the leadership too. People may get mad at me for saying this, but we need fresh minds in the Joomla Core Dev. Joomla 1.7 should be scrapped and go straight to Joomla 2.0, a total rewrite that should disregard legacy issues.

  • http://www.torkiljohnsen.com tj

    I agree Israel. Nooku has built a RAD framework, Emmanuel Danan has built FLEXIcontent and FLEXIaccess, they add more features to 1.5 than 1.6, and they both have done it in less time than the Core Devs, and with less resources.

    Also note that I have only scratched the surface in this blog post. A lot of issues have been left untouched. As I said, I started a list of change recommendations, but I scrapped it as I am writing a blog post, not a novel.

  • http://www.torkiljohnsen.com tj

    I agree Israel. Nooku has built a RAD framework, Emmanuel Danan has built FLEXIcontent and FLEXIaccess, they add more features to 1.5 than 1.6, and they both have done it in less time than the Core Devs, and with less resources.

    Also note that I have only scratched the surface in this blog post. A lot of issues have been left untouched. As I said, I started a list of change recommendations, but I scrapped it as I am writing a blog post, not a novel.

  • Pingback: Torkil Johnsen » What is this Nooku Framework thing really? And why should I care?

  • Hannes Papenberg

    Hi Torkil,
    yes, you are right, we need to clean up the Joomla database schema and there are lots of table that I’d personally would like to get rid of. For example #__messages_cfg. Or the #__core_* tables (we already removed quite a few of those in 1.6) I would be happy if you would help out with cleaning all this stuff up when we are coming to 1.7. :-)

    On the ACL thing: The viewlevels are JSON-encoded, since we only need to load them once per session. This is a list of integers, that you need and instead of doing a join from the content table to the viewlevels and then on a nested set of usergroups again (with an additional mapping table), we are just loading them once per session and then do an “access IN ()” in our SQL instead.

    The ACL-tables might not have a proper naming convention, but it is the most performant and flexible solution that I have seen so far. :-)

    Hannes

  • Hannes Papenberg

    Hi Torkil,
    yes, you are right, we need to clean up the Joomla database schema and there are lots of table that I’d personally would like to get rid of. For example #__messages_cfg. Or the #__core_* tables (we already removed quite a few of those in 1.6) I would be happy if you would help out with cleaning all this stuff up when we are coming to 1.7. :-)

    On the ACL thing: The viewlevels are JSON-encoded, since we only need to load them once per session. This is a list of integers, that you need and instead of doing a join from the content table to the viewlevels and then on a nested set of usergroups again (with an additional mapping table), we are just loading them once per session and then do an “access IN ()” in our SQL instead.

    The ACL-tables might not have a proper naming convention, but it is the most performant and flexible solution that I have seen so far. :-)

    Hannes

  • http://www.torkiljohnsen.com tj

    Hey Hannes, good to hear from you. Are you still working with Joomla development?

    Doing a join over three tables is definitely less performant, but still the cleaner normalized SQL way. The current NoSQL-ish approach is not exactly the convention we have been using in Joomla so far. I will not stand in the way of progress though, just found it to be… odd… together with the rest of the schema :)

    Wouldn’t caching this process at some level be a better solution?

  • http://www.torkiljohnsen.com tj

    Hey Hannes, good to hear from you. Are you still working with Joomla development?

    Doing a join over three tables is definitely less performant, but still the cleaner normalized SQL way. The current NoSQL-ish approach is not exactly the convention we have been using in Joomla so far. I will not stand in the way of progress though, just found it to be… odd… together with the rest of the schema :)

    Wouldn’t caching this process at some level be a better solution?

  • Hannes Papenberg

    Hi Torkil,
    the caching is what we are in fact already doing at this moment. We are loading that viewlevel data just once per session, so normally across several pageloads and a few hundred queries that require this data.

  • http://torkiljohnsen.com Torkil Johnsen

    “We're doing” you say, so you are still in Joomla development?

  • Pingback: Nooku Framework, why should you care ? « Nooku Blog

  • MySQL Workbench

    Hi Hannes,
    Any possibility to replace MyISAM tables with InnoDB? So much benefit just from that simple change. We have made InnoDB default in MySQL (MyISAM used to be default).

    - MySQL Workbench

  • MySQL Workbench

    Hi Torkil,

    Your blog is now aggregated and showing up on PlanetMySQL.

    But, this post “The Joomla database schema smells” was previously published (before aggregation). Can you republish it (so it gets aggregated)? I think it will be of big interest in the MySQL community too.

    - MySQL Workbench

  • http://torkiljohnsen.com Torkil Johnsen

    Hm. Did I provide the wrong RSS URL?

    The correct URL is supposed to be http://www.torkiljohnsen.com/category/databases

    The article is published there, as far as I can tell.

  • http://torkiljohnsen.com Torkil Johnsen

    A nice link concerning reverse engineering MyISAM schemas:
    http://wb.mysql.com/?p=271

    It's a plugin that will auto-create relationships, even though these do not exist.

    Downside to this is that it will only work with databases that have a clear naming convention for their databases, so Joomla is a no-go, but Nooku developers could find this useful.

  • MySQL Workbench

    Hi Torkil,

    You did everything fine. But, only your blogs published *after* the date your blog was aggregated get picked up. The “smells” blog, was published *before*, so it was not pulled in. That is why, I was asking you to republish it (new “posted date” will get it aggregated).

    Thanks.

    - MySQL Workbench

  • http://torkiljohnsen.com Torkil Johnsen

    okay, I changed the posting date to July 12th instead of July 11th, hope that helps!

  • Rosbif56

    Can’t agree more – just one thing (I new to Joomla, but this design makes my hair stand on end…) – I got the impression from the doc that there was a definite hierarchical progression from category to section to content. This sin’t reflected in the DB. Content links directly to Category. Is this a design consideration?

    • http://torkiljohnsen.com Torkil Johnsen

      Joomla 1.0-1.5 uses section-category-content. Meaning that sections hold categories which in turn hold content articles.

      For Joomla 1.6, there are now just categories (multi-levelled too) and content articles, so that’s one of the new high-profile features. Still years behind other systems, even other Joomla extensions, and one article can still belong to only one category.

  • Pingback: O modelo de dados do Joomla | SWX Softwares - Desenvolvimento e Design Web


Creative Commons License
This work by Torkil Johnsen is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.