Engineers are notoriously resourceful and Lob’s core services team is no exception. When faced with a problem (in this case, a failing service due to a database issue) the team put on their deerstalker hats and began to investigate.
Those involved quickly came to the conclusion that no one at Lob really "owns" the database, so true to Lob’s values they made the decision to Level Up their knowledge with an engineering “book club.” Oprah who?
Our first topic to zero in on was PostgreSQL. The outcome was to increase our confidence around how our database works and foster discussion and ideas around design decisions for our architecture with an eye on database optimization.
We thought we’d share the format of our sessions (thank you, Engineering Manager, Ross Martin), along with a specific example so you get the idea. We found this template has worked well to guide our discussions (read: keep us from getting distracted). We created a template in Notion to repurpose for each session; this also serves as our repository to capture learnings.
As Dr. Stephen Covey suggests, “Begin with the end in mind.” As we were having performance issues for specific types of queries in large tables (which were causing requests to time out) we identified two goals up front: 1) Learn about the different indexes provided by Postgres and be able to explain them, and 2) Understand their benefits and provide guidance on when to use a specific index.
The book club launched with a deep dive into indexing. The assigned reading was “The Art of Postgres,” Chapter 8: Indexing Strategy with optional resources also suggested (we have to pander to the overachievers in the group; turns out this is almost everyone). These included the official Postgresql documentation on Indexes, plus Chapters 63-68 on index implementation detail, and this PostgresPro blog post.
Prompts may differ based on goals or content but a few basic ones are helpful to guide the discussion. We like to rotate facilitators across sessions to give each member of the team a chance to level up.
What is one thing you learned today?
Indexes are used to either ensure data consistency OR improve read access.
New (to you) terminology
TID (Tuple Identifier) - A TID is a pair (block number, tuple index within that block) that identifies the physical location of the row within a table. i.e. (10, 5) is the 5th item of the 10th block of a table or index.
What is one thing you were uncertain about or didn't totally understand?
Constraints such as UNIQUE, PRIMARY KEY or EXCLUDE USING are only possible to implement in PostgreSQL with a backing index. (The unique constraints require searching for the specific element and an index prevents the need of a sequential search.)
Was there anything you had a (strong) positive reaction to?
An index cannot alter the result of a query. An index only provides another access method to the data.
Was there anything you had a (strong) negative reaction to?
As a consequence, each index adds write costs to your DML queries: insert, update and delete now have to maintain the indexes too, and in a transactional way.
Is there anywhere in our software (current or future) we could use something mentioned in the reading for today?
Running EXPLAIN on common queries would be a great way to see what indexes are currently in use, and if they can be made better.
This is a prompt to capture, share, and review Github gists or repos as applicable. We also share links to similar or tangentially-related content we discover (are you really a developer if you haven’t combed StackOverflow, documentation, GitHub issues, and so forth looking for more answers or examples?).
Our efforts to level up our db knowledge had an immediate impact. But in order to provide examples, it’s helpful to see what else we covered in our subsequent sessions around Postgres. We spent the next couple of meetups covering other areas of Postgres such as data types and internals; we included the specific reading recommendations in case you wanted to dive in too.
- “The Art of Postgres” Chapter 22 (Postgresql Data Types) and Chapter 23 (Denormalized Data Types)
- Explaining the unexplainable Part 1, Part 2, & Part 3
- Official Postgres Documentation, Chapter 8: Data Types, Appendix F.16: hstore, and Using Explain
- The Internals of Postgresql, Chapter 1: (Database Cluster, Databases, and Tables)
In addition to making us feel like geniuses, the lessons learned from absorbing and discussing this content as a team helped were the driving force behind several Postgres database optimizations.
For example, initially, we assumed that if we limited the data range a customer could export from our (user-facing) dashboard we would improve performance. But in testing we discovered this was not true. After our first session, we were much more comfortable with the details of Postgres indexing (and how it applied to our specific data and requirements). In our case, if we removed range constraints it would use the correct index; on top of that, by adding inflection to a query we were able to ensure it _always _used the right index. This helped empower customers to take better control of their data exports (also freeing up CX time and resources) and resulted in a faster and more responsive API and dashboard—these updates have yielded on average 15x better performance (!).
We’ll definitely be using this template to guide our exploration of another topic (it’s a toss up between a closer look at Git internals or the book “Designing Data Intensive Applications”). Is there a topic your team could level up their knowledge in? Imitation is the best form of flattery, so feel free to snag this template for your own use.
Education never ends, Watson. It is a series of lessons. –Sherlock Holmes