Startup Engineers and Our Mistakes with MongoDB
MongoDB got rave reviews for its usability. But other features mattered too when choosing a database for a growing startup.
This is part two. You can read part one in this three part series here.
Building a startup is like eating nails. It requires insane levels of perseverance, tolerance for ambiguity, and a strong work ethic. There are critical internal debates to ensure the team is on the right track. What feature should we build next? How will we attract more users? How will we get revenue or funding to keep the lights on?
In early 2012, I heard a much different startup debate: should we switch our database to MongoDB? This company had chosen Postgres, a "traditional" relational database. Now, their lead engineer was adamant that switching to MongoDB was critical for success.
A number of top companies (Etsy, Foursquare, and many other startups) were "using"1 it, and his recent meetups and friends all indicated that MongoDB was the future. The developer especially cited future scaling needs, the coming obsolescence of SQL, and engineer recruiting. He proposed to integrate MongoDB at the cost of several weeks of the roadmap for his tiny team.
Sadly, he misestimated how much work the still evolving MongoDB would take to learn and integrate, costing much more time than he realized. And even if the company was tremendously successful, the scale provided by most NoSQL databases would be immaterial, given his company's product. Just as importantly, he didn't know the tradeoffs he was making by choosing MongoDB.
In this case, the cost was relatively painless: simply a few wasted weeks and thousands of investor dollars. On the other hand, I know a number of teams that had much more challenging issues.
In Part 1, we looked at how the NoSQL hype enabled the early success of MongoDB. Now, let's look at MongoDB's benefits and the mistakes some startup developers made when choosing MongoDB in 2012. Understanding past issues can give us context about how to make better engineering decisions in the future.
The Benefits of Mongo in the early 2010s
10gen's key contributions to databases — and to our industry — was their laser focus on four critical things: onboarding, usability, libraries and support. For startup teams, these were important factors in choosing MongoDB — and a key reason for its powerful word of mouth.
Their support team was smart, responsive and plain nice, with 10gen correctly seeing the quality of their support as a key strength ("Support is the new marketing"). Typifying their support mindset and desire to understand user concerns, MongoDB's CTO and cofounder Eliot Horowitz was gracious in providing perspectives and answering questions as part of this series. And their engineers, technical writers, and designers created documentation and libraries that were a pleasure to read and use.
And yet, when choosing a database for a growing startup, startup engineers should have also weighed other considerations.
Mistakes with Mongo
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Feel free to remix or share it.
The importance of schemas, relationships, and transactions
In 2012, critical needs that startup engineers often ignored included:
- Schemas: Schemaless does not mean no schema; instead, it means an implicit schema in the app (a particularly challenging misnomer for anyone outside our industry)
- Relational Structures: Relational data is common and generally suited for relational databases (MongoDB's CTO disagrees with this statement arguing that nearly 90% of database installations today would benefit from being replaced with MongoDB)14
- Transactions: A number of use cases, especially financial ones, do benefit from transactions
By far the most consistent mistake was choosing a non-relational database, when your data was strongly relational. Mongoose's ODM made this mistake surprisingly easy to make, which led to issues down the line.
To pick on another, transactions are critical in a number of financial use cases, which caught a number of teams by surprise. In the case of Flexcoin, an attacker created many concurrent withdrawals of a single balance bankrupting the company - though there are debates about whether this was due to a mistaken use of MongoDB. UC Berkeley's Professor Joseph Hellerstein suggests that rather than thinking NoSQL, we think of "No transactions" instead.
MongoDB also had some key constraints in 2012 that many teams were unaware of such as the global lock, "unsafe" writes, or the challenges with getting sharding right. (MongoDB's CTO acknowledges the challenges with the "unsafe" writes — it would be the key thing he would do differently — but also notes that few teams had issues with them; early 10gen customers explicitly valued this behavior, even though it was different than most other databases)
For startups, the database decision makers were application developers (not ops veterans) who favored immediate productivity. Unfortunately, short-term productivity was outweighed by many medium-term issues.
Over time, many startups created application code to impose an implicit schema, which would require ever greater maintenance costs. And optimistic visions of switching over to a relational database when the company grew were often shelved due to the challenges of migrating the data store.
I’m empathetic to the desire to use a single data structure across one’s stack. One 10gen engineer made this point in analogizing SQL to Cobol, arguing that "SQL is Annoying":
Still, this is dangerous — and frankly daft — if it’s the primary decision criteria for a growing company. Some liken MongoDB to the dynamic in static vs dynamic languages. And yet, a dynamic language in beta can be a problematic choice until it matures, especially if it targets a different use case than the one you are facing.2
MongoDB did provide tremendous value to some, but the cost for many early stage startups often outweighed the very real short-term benefits. MongoDB's CTO disagrees, arguing that many startups were only able to survive and grow due to the value MongoDB provided in the early 2010s, and that these successes far outstripped the very few who had issues.
Someone Else’s Solution for Your Problem
Solutions that benefit others may not work for your own, widely different, environment. Startup engineers need to differentiate between what is good for a side project or a company with internet scale versus what’s appropriate for their startup.
Large tech companies have a unique set of challenges, and their solutions — like NoSQL to manage the huge data flows they face — may not apply to you. And yet, much of the excitement in technical circles will be about technology these companies pioneered for their own unique needs.
Professor Hellerstein makes this point in the context of MapReduce, though he could well be speaking about NoSQL 3:
The thing is there are five companies in the world that run jobs that big ... People got ... Google mania in the 2000s: we’ll do everything the way Google does ... The reasoning was ... they’re supposed to be smart, but they had a different problem than most people had. They were optimizing for different things than most people should be optimizing for.
According to 10gen, the majority of MongoDB users still don't have large enough datasets to shard — though some do value the option to.
Even when customers use a tool, marketing testimonials neglect to mention that they are often used in less important systems or are simply an experiment. Often, prestigious tech startups will try new tools in a non-core system — and publicize the experiment partially because it helps recruiting. The broader message that startup engineers can hear is “X company believes strongly in Y tool — and you should consider switching.”
Another common mistake for first-time startup engineers (especially for those who've previously worked at products with massive scale) is to overweight performance or scalability in tool choices. Solving these issues early — such as by using a brand new NoSQL tool with still maturing norms — is often costly and doesn't generally increase the odds of startup success (MongoDB's CTO disagrees with me here, arguing that the JSON-like data store was transformational for their users). By the time they are a huge bottleneck, you often have huge financial and engineering resources to attack them.8 Premature scaling is a common technical trap in startups.
On the flip side, choices made in a prototype have a habit of persisting far beyond their expected life. As such, there's a balance between favoring myopically early productivity and — at the other extreme — trying to prematurely solve problems that might not cause issues for years.
Early usability and later scalability were both key explanations startup teams gave for choosing MongoDB.
Taking the Time to Learn a New Tool
Whenever a new tool appears, we apply old ways of thinking to the new tool. You may try to use NoSQL databases exactly how you used your SQL database, which is a recipe for failure. You can't use new tools without setting aside the time to truly learn them.
For example, a common mistake was not understanding how data models should be different in document databases.
To help others, Russell Smith7 created a great overview of other gotchas in late 2012, including:
- Manually turning on authentication and encryption (not on by default) and turning off remote connections
- Ensuring an odd number of replica sets (not an even number)
- Being aware of the asynchronous, "unsafe" writes
- Understanding the 2 GB data limit of the 32-bit MongoDB
- Using the official repository, rather than package managers given how quickly MongoDB was changing
- (and many others)
His list should have been required reading for anyone planning to use MongoDB in production — and yet, many of the teams I saw didn't know these critical details. (MongoDB's CTO mentions that the documentation was always up to date)
Server Density’s David Mytton goes through his own long list of mistakes seen in the wild and then highlights the broader issue:
[Many] of the problems cited in these posts seem like basic mistakes in deployment and understanding.
With any product, if you decide to deploy it to production you need to be sure you fully understand its architecture and scaling profile.
He and Russell argued that the community concerns about MongoDB in 2012 were "misguided": after the initial hype, there were a number of angry posts on HN. But it isn't unfair to expect saner defaults and better upfront communication before starting to pitch a product at hackathons or to junior developers as the future of web development.4 It also might cause issues for some junior engineers if you pitch a database lacking transactions at a FinTech hackathon without appropriate caveats.9
For any new foundational technology, it takes time for to learn — which makes it a challenging choice for consistent developer productivity. This was especially true if your startup had limited technical resources and time to truly learn how to use MongoDB.
Beyond these core failures, there were others:
- Choosing modern technologies simply because that's what engineering leads thought it took to recruit "good" engineers 5
- Spending precious time learning new dev tools, rather than focusing on what technical work is most critical to the success of a startup
- Choosing a database for one use case, and then not changing when the company pivoted to a wholly different use case (see early Bitcoin exchanges)
- Email me (mongodb at this domain) if you'd like to share your own positive or negative story confidentially
While this story is about MongoDB, the broader message is that fashionable choices made without deep reflection can put startups at risk.
Making Better Engineers
I'll highlight a few broader points that underly these engineering mistakes.
First, it's challenging to make engineering decisions based on what is popular on Hacker News or Reddit, such as the posts around NoSQL, the MEAN stack, or even the later anger at MongoDB. Social networks are a key input into engineering decisions early in your career or when you're working on a tiny team (such as a startup). As engineers, we need to understand the issues with doing this.
On one side, many different types of software engineers congregate together in social media — and what is right for one is not right for the other. We don't consistently discuss our use cases (or our experience/background) when commenting on a technology choice, instead often arguing that a technology choice is good or bad.
Think about how the decision of the "right" database changes if you're using it for just a hackathon/side project, or for logging, or for a webscale product at Google. Or how about if you knew a blog post writer was knocking a technology they had barely used or was opining on a part of the stack where they had little experience or worked at a competing company?
Many of the best engineers I know spend their time on engineering problems, not blogging — which limits the amount of great content available. It's also hard to blog about failures when it impacts your company's reputation or your own job prospects, which further dictates what content is available. Few of the teams I talked to with MongoDB issues were willing to go on record.
On the other side, it's easy to game social media, which dev tool vendors and engineering marketers know. A passionate few can get favored topics upvoted, and vendors will reach out to proxies to write or share their message (as we'll see in detail next time13). Cornell's Professor Emi̇n Gün Si̇rer, who contributes to his own competing database, explicitly blames NoSQL vendor marketing strategy in engineering social media for the issues with MongoDB: "[Engineers] did what anyone would do after reading one too many astroturf articles on Hacker News."11
Democracy (upvotes or commenting, with everyone's vote treated equally) is not the right choice for many forms of learning, even though it is the standard for Reddit-type networks. Can you imagine for example, if we taught high school and university engineering students based on student upvotes alone? We'd naturally favor sexy technology and make it easy for organized parties that personally benefit to set our agenda, not favor the content that makes the most thoughtful engineer. And yet, for some, this is a key part of how engineering is learned.
I'll challenge our community: design a Hacker News algorithm — or more likely, human assisted system — that favors content and comments that makes a better engineer and is less permeable to attack. Our current algorithm is an alpha on the way to a more mature algorithm that understands what engineers need to learn, not just what some want to publicize.
Second, coding bootcamps and online programs increasingly teach a substantial minority of developers — with nearly a third of new software engineers from bootcamps in 2016. They play a critical role in determining how engineers think and what technologies succeed, and I'm heartened by the new perspectives that their students bring.
My worry is that some bootcamps themselves favor technology choices (which change quickly), rather than core concepts. Learning concepts — rather than quickly changing DSLs — lets you reason from first principles. They can inoculate you from weak engineering arguments, while future-proofing your career as you can navigate future changes.
For example, there's value to learning about basic build automation concepts generally (and Make specifically), even in this Webpack/Gulp/Grunt era. Anyone who's had the time to understand the basics of build automation and survey the tradeoffs of the various tools, will then find it much easier to assess future tools. This is also a reason why coding bootcamps should be spending at least some time teaching SQL and transactional vs. non-transactional, as they are important paradigms that their students will likely be exposed to.
Even though university curriculums are often pilloried for lagging behind, the flip side is that they can more effectively withstand hype — and professors aren't as easily swayed by marketing or focused on placement rate. And even then, the best university programs focus on computer science, without spending adequate time on the art of software engineering.6
There are broader lessons that need to be taught to all software engineers from thinking in tradeoffs rather than good/bad to valuing boring technology. Cloudflare's CTO John Graham-Cumming notes, "The only useful piece of advice I can give a younger developer is... be careful when drinking the newtech koolaid." Every single senior engineer I know would attest to his statement - and all of us have the scars to show where these lessons came from.
Finally, some commenters on HN were scared to challenge technology decisions because of the risk to their job prospects. In my experience, the best engineering managers10 look for critical thinkers, not those who thoughtlessly extoll or hate new technology. I’ll also argue that great engineers make thoughtful assessments, and revisit assumptions when the data changes. Our teams are not well served by either keeping quiet or having strongly held views that can’t be revisited (including a view like "never use MongoDB", once many issues were fixed).
In the final parts of this series, we'll look into a critical part of MongoDB's success in the early 2010s: marketing and testimonials.
Sign up to be notified when the next post is available — or follow me on Twitter This is part two. You can read part one in this three part series here.
This essay is based on several years of informal discussions, interviews with key stakeholders, parsing countless blog posts/presentations, and reading ~3,000 HN comments. To dig in more, you can see select commentary excerpts and my other thoughts. All opinions are solely my own. I welcome feedback (email mongodb at this domain).
Thanks to Mathieu Jouhet for countless hours spent on design and to Shay Maunz for edits. I especially have to thank the many software engineers who shared their experiences and provided feedback.