• 4
name Punditsdkoslkdosdkoskdo

Should i always have an autoincrement integer primary key?

In my databases, I tend to get into the habit of having an auto-incrementing integer primary key with the name id for every table I make so that I have a unique lookup for any particular row.

Is this considered a bad idea? Are there any drawbacks to doing it this way? Sometimes I'll have multiple indices like id, profile_id, subscriptions where id is the unique identifier, profile_id links to the foreign id of a Profile table, etc.

Or are there scenarios where you don't want to add such a field?

Autoincemental keys have mostly advantages.

But some possible drawbacks could be:

  • If you have a business key, you have to add a unique index on that column(s) too in order to enforce business rules.
  • When transfering data between two databases, especially when the data is in more than one table (i.e. master/detail), it's not straight-forward since sequences are not synced between databases, and you'll have to create an equivalence table first using the business key as a match to know which ID from the origin database corresponds with which ID in the target database. That shouldn't be a problem when transfering data from/to isolated tables, though.
  • Many enterprises have ad-hoc, graphical, point-and-click, drag-and-drop reporting tools. Since autoincremental IDs are meaningless, this type of users will find it hard to make sense of the data outside of "the app".
  • If you accidentally modify the business key, chances are you will never recover that row because you no longer have something for humans to identify it. That caused a fault in the BitCoin platform once.
  • Some designers add an ID to a join table between two tables, when the PK should simply be composed of the two foreign IDs. Obviously if the join table is between three or more tables, then an autoincremental ID makes sense, but then you have to add an unique key when it applies on the combination of FKs to enforce business rules.

Here's a Wikipedia article section on the disadvantages of surrogate keys.

  • 0
Reply Report

I disagree with all the answers before. There are many reasons why it is a bad idea to add an auto increment field in all tables.

If you have a table where there are no obvious keys, an auto-increment field seems like a good idea. After all, you don't want to select * from blog where body = '[10000 character string]'. You'd rather select * from blog where id = 42. I'd argue that in most of these cases, what you really want is a unique identifier; not a sequential unique identifier. You probably want to use a universally unique identifier instead.

There are functions in most databases to generate random unique identifiers (uuid in mysql, postgres. newid in mssql). These allow you to generate data into multiple databases, on different machines, at any time, with no network connection between them, and still merge data with zero conflicts. This allows you to more easily setup multiple servers and even data centers, like for example, with microservices.

This also avoids attackers guessing url's to pages they shouldn't have access to. If there's a there's probably a as well. This could allow automation of a security exploit in the user profile page.

There are also a lot of cases where a uuid column is useless or even harmful. Let's say you have a social network. There is a users table and a friends table. The friends table contains two userid columns and an auto-increment field. You want 3 to be friends with 5, so you insert 3,5 into the database. The database adds an auto-increment id and stores 1,3,5. Somehow, user 3 clicks the "add friend"-button again. You insert 3,5 into the database again, the database adds an auto-increment id and inserts 2,3,5. But now 3and 5 are friends with each other twice! That's a waste of space, and if you think about it, so is the auto-increment column. All you need to see if a and b are friends is to select for the row with those two values. They are, together, a unique row identifier. (You would probably want to do write some logic to make sure 3,5 and 5,3 are deduplicated.)

There are still cases where sequential id's can be useful, like when building an url-shortener, but mostly (and even with the url shortener) a randomly generated unique id is what you really want to use instead.

TL;DR: Use UUID's instead of auto-increment, if you don't already have a unique way of identifying each row.

  • 0
Reply Report

It's never a bad idea to have a guaranteed unique row identifier. I guess I shouldn't say never – but let's go with the overwhelming majority of the time it's a good idea.

Theoretical potential downsides include an extra index to maintain and extra storage space used. That's never been enough of a reason to me to not use one.

  • 3
Reply Report