Postgresql -> Cockroach DB


Context:

  • This story comes comes from Storj Labs, a storage provider that boasts resiliency of storage objects and high performance
  • Storj Labs wants to move from Postgresql to CockroachDB.
  • Postgresql is a production grade ACID compliant relational DB that can fit into many use cases such as OLAP, OLTP, GIS, Events Processing.
  • Cockroach DB is a SQL compliant DB that boasts high performance with built in sharding capabilities/horizontally scalable. This was a DB I’ve been looking out for as the use case is scalability while remaining as SQL DB. This is not covered by many existing relational DBs.
  • Storj Labs has exabytes of data on the Storj network and the current architecture stores 90PBs of file meta data. They’re looking for multi region availabilities and keep up with network.
  • Article was written in 2020 and since then, the new Cockroach DB has updated version that has better Postgres Compatibilities

Options for Horizontally scalable databases (to be performant as datasize becomes larger)

  • Shard your current relational DB (custom solution)
  • Use NoSQL DBs like Cassandra or BigTable
  • Use NewSQL DBs like Spanner or CockroachDB

Decided at the time that Cockroach DB was the best solution.

Migration steps required

  • Rewrite incompatible postgresql SQL code to cockroach SQL code
  • Load testing
  • Evaluating analytics for Cockroach DB
  • Migration

Rewriting incompatible SQL code

  • Mentioned these are mostly migration codes, likely INSERT statements
    • Primary keys cannot be changed easily in CockroachDB. If such action is required, need to crate a new table with new PK, move data over, and then drop old table.
  • Again this was in 2020 where Cockroach DB was version 19, now it is 2022, version 22. So things have changed a lot since.

Load testing

  • Found that Cockroach DBs sorts by primary keys, whereas Postgresql doesn’t have a defined order. A surprising performance was queries related to ordered iteration on primary keys.
Speed of ordered iterator: 
CockroachDB 
Took 3.5s for 100,000 records 
Took 18.8s for 1,000,000 records 
Took 14m0.5s for 10,000,000 records 
CloudSQL Postgres 
Took 56.8s for 100,000 records 
Took 4m53.3s for 1000,000 records 
Took 1h48m25.1s for 10,000,000 records
  • CockroachDB has a feature called prefix compression, (sounded more like prefix deduplication however) – this feature saves space by not needed to store prefix shared with previous record. While cockroach db needs to replicate data (3 times by default) the additional bytes was a little bit over 2 times.
    • Cloudsql Postgres ~3658 bytes/row
    • CockroachDB ~2846 bytes/row (before replication)
      • After replication about 5692 bytes/row (about 35-45% increase of postgresql storage)
  • CockroachDB runs everything in transaction either explicit or implicit
    • Explicit, it allows application to decide to retry if needed
    • Implicit, the cockroach db will retry in the background
  • Mentioned there is a downside with Cockroach DB on transaction contentions. Sounds like it cannot handle as many concurrent transactions. To mitigate:
    • Use lower number of connections. This might be OK since certain query patterns are faster in cockroach DB than postgresql
    • Multi row upserts
    • Bulk inserts
    • Blind writes
    • Following other performance guidelines

Migration

  • Had to pull the data from Postgresql, and then sort the data INSERTS statements (because CockroachDB sorts PKs)
  • Cockroach cloud has good automation for backups, this is equivalent to GCP backups that was provided out of the box.

When to use CockroachDB?

  • You want a distributed OLTP database that is fault tolerant, highly available without sharding the DB yourself.
  • The queries cannot be highly complicated (hence OLAP use cases are not suitable for CockroachDB)
  • Write Speed is not priority, it will never achieve the latency performance vs Postgresl from a single region. CockroachDB uses Raft Consensus Algorithm for writes.
  • Uses cases includes Keeping System of record. Eg: Unified User Profile.
  • Only if your DB requires sharding/data distribution, otherwise moving to CockroachDB might incur more storage costs.

Who uses CockroachDB?

Special mentions:

  • YugabyteDB uses the same underlying RocksDB storage, seems to be a competitor in the distributed sql db space.

Ref:

Leave a comment