What better to use for primary key. UUID or integer.
Drawbacks of UUID
It takes more memory. There are 128 bits against 64 bits for int8. Index will be bigger too.
They should be generated. It takes processing time.
They are still not unique. Collisions might happen. I need
write more code to catch this error and generate new UUID.
If I use serial I can be sure that value is unique. With
UUID I should keep constraint on production database. It
take more resources and time for insert operations.
UUIDs are not indexed well. Key should be sequential for best performance. Some people try to write own function to generate UUID that contains serial part and random part.
I can't make partitions of a table by UUID. There is not diapason.
It is pain to debug code where objectId='B1E7D34-1DF-4DC-5BA3E3DF456A410'. I can't remember this and look with my eyes. Serial number might be bigger than I can remember too. But it is sequential and I need to remember just some digits from the end.
I can't sort on front by UUID. So I need to save and load more information (ex. date).
Positive about UUID
UUID is good as identifier if I would like to generate id before save entity in DB. I can make it on front. Don't forget to check that it does not exist and handle exception. There is not one more request to get id for new entity.
UUID is easy to find in logs. Grep by number can give not relevant lines.
UUID can be used as unique acrose all tables and entities. If I have files table, and files can be linked with multiple entities, then instead of multiple tables (object_id, file_id) or one table (object_id, object_type_id, file_id), I can just create one table (object_uuid, file_id) to reference the UUID for any type of entity and the file.
UUID is good for file name too. If files are on disk. I can name files with UUIDs and link these UUIDs with my entities. And there is no table with file name in DB.
UUID is good for external objects those ids I can't control after creation. Serial numbers will not be the same forever. If I need to do custom backup (or merging or replication), I can forget to keep id. New one will be generated from sequence. In old data serial numbers can have gaps because of transactions not completed. There are no gaps in new data. With merging keep old ids might be impossible.
Merging databases is easy with UUID. Just take care about possible collisions.
It is possible to share common data between different shards, when you can't have distributed sequence.
Serial ids are predictable. User can try to get access to
data that he does not own. User can analyze application
audience and traffic. User can easy grab data from
site.
With random UUID analysis is impossible. But it is still
worth to check data owner.
Get along without UUID
For offline data I can create temporary id. Then it will be replaced during synchronization.
If I have multiple servers (shards, replicas), I can give each server a ID and make the primary key a combination of the entity ID and the server ID. Primary key would have maximum size up to (n * 64bit), where n is number of servers.
Another strategy is create a range of available IDs for each server.
-- for the first server:
create sequence seq_entity_id start 1;
-- for the second server:
create sequence seq_entity_id start 1000001;
-- for the third sever
create sequence seq_entity_id start 2000001;
It is hard to say that 1000000 is maximum. So it is more
convenient to use INCREMENT
.
-- for the first server:
create sequence seq_entity_id start 1 INCREMENT 100;
-- for the second server:
create sequence seq_entity_id start 2 INCREMENT 100;
-- for the third sever
create sequence seq_entity_id start 3 INCREMENT 100;
There is space to have up to 100 servers.
To do IDs not predictable I can generate them from
sequence unique for each user.
Instead of nextval('seq_entity_id')
I will
use custom next_entity_id(user_id)
.
CREATE TABLE t_user (
user_id integer NOT NULL DEFAULT nextval('seq_t_user_id'),
user_pk_entity_id integer default 0 NOT NULL
);
CREATE OR REPLACE FUNCTION next_entity_id(userid integer)
returns integer AS
$
DECLARE
next_pk integer;
BEGIN
UPDATE t_user
set user_pk_entity_id = user_pk_entity_id + 1
where user_id = userid;
SELECT INTO next_pk user_pk_entity_id
from t_user
where user_id = userid;
RETURN next_pk;
END;
$ LANGUAGE 'plpgsql';
The function next_entity_id
is transactional.
The row from t_user
will be locked. But it
will not block other users. So it would not be a
performance issue.
Positive effect is gapless id for entity table.
Take care and increment id into
UPDATE
statement. If I make
next_pk
before update there will be error.
SELECT INTO next_pk user_pk_entity_id + 1
from t_user
where user_id = userid;
UPDATE t_user
set user_pk_entity_id = next_pk
where user_id = userid;
RETURN next_pk;
With this code user can create simultaneously two entities
with the same id. The second thread waits on update. But
it already has next_pk
based on current
user_pk_entity_id
. And the second thread does
not see the changes from the first thread unless the first
transaction is commited. So
user_pk_entity_id
is still 0
,
like it was before the first request. After two request
user_pk_entity_id
will be 1
. And
two entities will have id = 1
.
In addition, I hide user id from user. Or I can add new UUID column to use on front. But primary key still be serial.