Dec 18, 2021

Choosing an Implementation for Unique Identifiers

Picking the right underlying system for generating unique identifiers (read ID generator implementation) throughout your application is an important task early on. Even if it doesn’t seem that important, having to revert a bad decision made early can require a lot of time and effort you’d gladly spend on a thousand other things.

Getting buy-in from stakeholders to fix something like switching to another ID format will be hard if it’s not completely breaking your daily operations, and if you’re in this situation, there are other fires to put out anyway.

There are a couple of rules that can help you make a future-proof decision when deciding what to go for in the early stages, so let’s dive right in!

Pick what worked for others

It’s easier to choose a solution that worked well for other companies and keeps serving them well. As with other technologies, picking a proven solution can save you a lot of time and pain, even if it looks boring.

Pick the properties you need

Different implementations of unique identifiers have different properties that suit a variety of use cases. Make sure to pick what works best for you, here’s what this could include

k-sortable: Identifiers sortable by generation timestamp can help you implement sorting capabilities and give an implicit guarantee about the order in which two entities were created.

uniqueness/collision-resistant: Depending on your use case, you might need strong guarantees your identifiers are completely unique and in other cases, it doesn’t matter as much. the more important it is, the more resources it might consume to generate an identifier, decreasing the performance for high-load situations.

fixed-size requirements: Some identifiers may grow, but some are guaranteed to respect a size constraint regardless of outside factors like the timestamp or other information going into the identifier.

Know the fine print, never assume anything

When looking at some example IDs a specific library produces, you should ask yourself what parts they’re made of, and if they’ll grow over time. Are those fixed-length strings and padded to make sure they’re not coming out shorter than you expect or are you getting a certain length in most cases with a small chance of getting something different?

The more properties you assume your ID to have, the higher the impact if that’s not the case.

There are two simple examples to make the previous point more tangible

  • If you expect your IDs to be 32 characters and they randomly grow or shrink by a couple of characters
  • if you expected certain characters to be included at some point and they’ve changed
  • If you expected that unlimited unique identifiers could be generated but you’ve used an incrementing number and it simply wrapped after overflowing

Does any of that sound familiar? If you’ve hit this case and your application assumed something that didn’t match reality, you might have run into extremely critical failures in places you would have never imagined to break.

Depending on where you used and exposed your identifiers, it can be hard if not impossible to change certain aspects in the future. If you told your database to expect strings of a certain length for IDs, you have to update that constraint when your IDs grow and hope your database didn’t just truncate longer IDs.

That’s mostly it! Be careful which system you go for and how to integrate it to avoid surprises in the future. Only limit ID length (e.g. in database columns) to an arbitrary value if you’re completely sure generated identifiers never exceed this value.