Database Naming Conventions

2013-02-10

I had no idea when I started making slides for this portion of my talk (Bringing Good Design to the Table) at SunshinePHP that this would be such a contentious issue. I should have known in hindsight, but I didn't expect this to be such a holy war. In the future I'll know better and preface this section quite strongly with "my opinion" caveats.

So that being said, I decided to outline those conventions again here. I'm going to try to give more of my reasoning behind the different choices below, to hopefully stem a few of the objections. Then again, nobody likes being told that the way they're doing things is "wrong" so let me again say that this is my opinion. I didn't create this convention and I'm not the only one that uses it, this just happens to be the one I'm most comfortable with. Maybe this will become part of a series exploring the contents of my talk, who knows.

The value of convention

The one thing that basically everyone agreed with during my talk is that you should have a naming convention and stick to it. It reduces the cognitive load on your users when dealing with your system if it behaves in a predictable way, just the same way a coding standard would. A rigid convention also makes it much simpler to use/write a query builder if you don't want to construct your SQL by hand.

Table Names

  • employee is Bad
  • employees is OK
  • personnel is Better

Table names should be plural nouns. Ideally if there’s a separate noun representing a group of something you want to use that. The goal here is to encourage thinking in sets since that’s how SQL operates. The more you can get used to set-based logic, and get away from the “for each” type of processing mode you get into with most programming languages, the less you’ll be tempted to write a CURSOR statement. A poorly written CURSOR is death to performance. That being said, most people will stick with “employees” and that’s fine too.

Since this was the most contentious part of the conventions section, I'll go into a bit more detail. Tables represent a collection of records, so even if you're only going to return a single record every time you use it (like a table containing configuration metadata about other specific objects), you still want the table itself to be plural. Again this goes back to encouraging set-based thinking. Also, you never know when that usage pattern may change. Another group of people that seem to disagree with this idea is those using the Active Record pattern, because each Active Record model encapsulates both the access to the table and the current record being used, so a singular table name might look more reasonable there. No convention is perfect, just pick one that fits your use cases.

Linking Tables

Linking tables are used to represent a many-to-many relationship. They should be named with the first table name, an underscore, and then the second table name. The first table should be singular and the second should be plural. Simple enough.

  • customer_addresses
  • teacher_courses
  • employee_salaries

Column Names

Yes No
id cust_no
address_id addr
postal_code zipCode
locality customer_name
region Reserved words

Primary keys should be called "id". This assumes you're using auto-incrementing integers for your primary keys. If you're not, keep the column's original name. Foreign keys should be $singularForeignTableName$_id.

Column names shouldn't include their parent table name. A "customers" table should not have a “customer_name” field, just “name” is fine. Don’t Repeat Yourself.

You should also not rely on mixed case field names to make your fields readable, because not all database collations support mixed case names (and most default ones don't). The "_ci" at the end of a database collation stands for "case insensitive". Just separate words in your field names with underscores.

Another thing to avoid is putting the data type in the column name. We don’t need to use Hungarian notation with our code anymore, we should use it in our databases either. If those types change over time, (and they occasionally can) it’s going to be much less work to change the type of a column than it is the name. If you’re using MVC architecture, a column name change can result in changes through all three layers of the code base, plus the database schema itself.

In addition, avoid reserved words in your table names. You never know what types of systems you may have to interface with, so avoid the whole list of reserved words regardless of what database system they come from. You don’t want to always have to wrap your field names in backticks or whatever because you really wanted to name your description column “desc” and now SQL thinks you want an ORDER BY.

Finally, lean toward internationalized versions of field names when available. Don’t use zip code when you can use postal code, swap out city for locality, state for region, that sort of thing.

Closing

The most important thing is to choose a convention that your team can get behind, and then use it faithfully. If you're really torn about what convention to adopt, look at the object/property access syntax in the language that your application is written in. If your data is being used to create Ruby or Python objects, then underscores will look more normal than camelCase. If it's being used for JavaScript, then maybe camelCase is better depending on what convention your front end developers use.

Tags: standard, database

Comments