Quick! Name a technology category that has nearly 400 different options vying for your attention; that pulled in over $80 billion in revenue last year but is actually accelerating in its growth rate; that, decades into its existence, still spawns startups with seemingly bottomless amounts of venture funding; and that drove the most job listings of any programming language last year. If you guessed “database,” you’d be right.
Why is this decades-old market so incredibly hot right now? Even as database bearers like Oracle see growth slow, the category is booming. As I’ve written, a big reason is cloud, but the bigger reason is simply that data keeps growing in importance to every enterprise, with diverse, unstructured data giving birth to new databases to manage it all.
Evolution meets revolution
This isn’t supposed to be how markets work. Product categories rise and then tend to dwindle over time, replaced by other things. For example, Microsoft minted billions in the operating system (OS) market, but today we don’t really care much about the OS.
On the desktop, things like ChromeOS have made it clear that it’s the browser/web that matters most, and on the server, businesses have increasingly been thinking in terms of serverless.
SEE: Cheat sheet: How to become a database administrator (free PDF) (TechRepublic)
Or remember when app servers, enterprise resource planning (ERP), enterprise content management (ECM) were hot new markets? Companies still depend on these products, or some variant of them, but they’re not considered growth markets.
Databases arguably should be the same. Relational databases were born in the early 1970s, and we had Oracle, Microsoft, and IBM spin up massive businesses to sell and support them. We should be seeing this market now entering its end, but we’re not.
While these vendors have seen their database revenue growth slow, the market as a whole has done anything but. Some of their customers are increasingly flirting with PostgreSQL, but even more are turning to cloud databases. Some are even taking on both, with AWS and other cloud giants offering managed PostgreSQL services.
There has also been a profound and sustained rise in so-called “NoSQL” databases. While I like the trend, I don’t particularly like it because databases like MongoDB, Apache Cassandra, Neo4j, DynamoDB, Redis, and others aren’t being embraced because of what they aren’t, but rather for what they are — flexible, horizontally scalable and able to manage the explosion in unstructured data.
Indeed, relational databases, with the prominent exception of PostgreSQL, have declined relative to non-relational databases over the last nine years, including over the past year, as measured by DB-Engines (as illustrated here).
That’s not to suggest that SQL/relational use is on the wane. In fact, SQL adoption, as measured by job postings, keeps increasing.
Enterprises are increasing their interest in developers who can query the databases that have been running their enterprises for years using comfortable, widely used SQL. SQL is popular because it’s been a great workhorse for the enterprise.
At the same time, enterprises are also just as clearly looking for developers that can help them query new data types and sources, which often won’t involve SQL.
It’s not an either/or decision, in other words. For enterprises of any reasonable size, it’s a matter of “and.” Enterprises are simply trying to make the best use of their data and turn to the right database for the job.
Restructuring the market
Zilliz, the company behind the open source vector database Milvus, just raised $60 million, to add to the $43 million raised in 2020. Never heard of a vector database? You’re not alone. A vector database is intended to manage vector embeddings. According to Zilliz’s Frank Liu:
The increasing ubiquity of unstructured data has led to a steady rise in the use of machine learning models trained to understand [unstructured] data. word2vec, a natural language processing (NLP) algorithm which uses a neural network to learn word associations, is a well-known early example of this. The word2vec model is capable of turning single words (in a variety of languages, not just English) into a list of floating point values, or vectors. Due to the way models are trained, vectors which are close to each other represent words which are similar to each other, hence the term embedding vectors.
As such, vector databases prove useful in things like image search or searching within video, audio, or other forms of unstructured data to understand the content, not the keywords associated with that content.
My point isn’t to offer a tutorial in vector databases. Rather, it’s to show that with the continued growth of structured and, especially, unstructured data, the database market will continue to balloon. At the same time, we’ll see new approaches to databases crop up.
Disclosure: I work for MongoDB but the views expressed herein are mine.