Polyglot Persistence: Analytics Purgatory Or the Birth of A New Era of Analytics?

Ancient History: Using A Single Database

Chris
There was a time when people and companies thought they would consolidate their businesses on one database technology, but that’s pretty much history, right? The term “Polyglot Persistence” is common — whether people call it that or not. Is using multiple database technologies standard now?

Jeff
So let’s say fifteen years ago for argument’s sake, companies actively wanted to standardize on a single database. There were two reasons for that. Well, sort of two and a half. One of them was cost. That’s the half. There was this idea that it would be less expensive because of resources and costs and they could get better license leverage. One of the other reasons was efficiency: it was easier to run your enterprise when you were standardized on a certain database. This was the message Oracle used to become a $50B dollar company. The other reason was that there really weren’t other choices. Your choices back then were relational databases or VSAM files and nobody was that interested in going back to VSAM files, which was main frame stuff.

Yes, Of Course, Use the Database That Fits The Requirements

What’s changed is that over the last ten to fifteen years, sort of starting in the early 2000’s, we’ve seen this explosion of specialized databases that are used for specific types of applications. If you have a social media application that you’re storing event data in, you might use something like Cassandra. If you are trying to build a complex SaaS application where you don’t want to have to reinvent the data model every time you get a new customer, you might use something like MongoDBor Couchbase. Then, if you’re just trying to do analytics on top of Hadoop, which is a file system, you might use HBase. Graph databases are a direct result of social media, essentially, because you can’t store a thousand tweets or re-tweets of an original tweet inside of a table.

Chris
So companies now are realizing that certain types of data belong in a certain type of database — that data shouldn’t be forced into a prescribed container — that’s the Polyglot Persistence.

Jeff
Companies pick databases that meet the needs of specific requirements. What they’re NOT doing is trying to force data into databases that don’t fit the need or work well with the type of data being created. That’s the piece that changed. I think in the beginning of this people said, “no we already made our decision to standardize on SQL Server so let’s just force SQL Server to somehow work for a graph application. That’s the piece where they finally have gotten over the hump and said, yeah that’s just never going to happen.

There was a certain element of fighting the tide, but I think now in 2016, almost 2017, companies have just said, I’m okay with having multiple databases. There’s two pieces to it though.

[clickToTweet tweet=”There’s the technology shift, which occurred and is occurring. And there’s the mental shift..” quote=”There’s the technology shift, which occurred and is occurring. And there’s the mental shift, where CIO’s said it’s okay. It’s all right to have polyglot.”]

Chris
So that’s an exciting time — for enterprise data/technology gatekeepers to open the door to application development on new database technologies.

Jeff
Yes. I think it’s definitely something in the last six years. 2010 was when you started to see a really big shift in all that. I mean there was definitely shifts going on and some early adopters in the Hadoop ecosystem and stuff like that, but it really was around 2010 when you started to see companies say hey, we need to adopt MongoDB, Cassandra, we need broad adoption of Hadoop and all the stuff that lives around that and hey we might want to implement a great database like Neo4j.

Chris
Then the application developers are happy, right? Because they get the tools that they need. Then what’s left out? Where’s the BI folks in this situation? It’s now even harder, more expensive…

Thanks Polyglot: Exponentially Harder, More Expensive Analytics

Jeff
Right. So with Polyglot, there’s no way to have that consistent consolidated analytic experience anymore. What drove the success of BI broadly over the last twenty years was this idea that an enterprise could adopt a BI tool, whether it was Business Objects or Cognos or more recent stuff like Tableau or QlikView — they could adopt it and it worked on all of their data because all of their data up until six or seven years ago was relational or practically speaking, the vast majority of it was.

Now what’s happened is that’s changed.

For a lot of enterprises, they’re creating more non-relational data every day than they are relational data, from SaaS applications and all kinds of new things they’ve built in their business.

They’re creating data lakes using Hadoop and things like that. Now they’ve got all these new data types that are a significant part of their business. It’s no longer this tiny little fractional, single percent thing.

The Reigning Solution: ETL, Wrangling, Shoehorning, Forcing

They need to be able to do analytics and they don’t have the tools because the Tableau’s of the world and the QlikView’s and the Cognos and the Business Objects simply don’t work well. They make updates, they keep trying bigger, better, badder versions of ODBC and JDBC and mapping tools and data prep tools.

But what’s been created, essentially, is an entire industry dedicated to trying to fix the data instead of trying to fix the analytics tools.

We have created an entire industry that’s literally saying, Mr. Customer you’re data structure is wrong, change it. In reality, their data structure is not wrong, their analytic tooling is wrong, or broken.

Chris
So then it’s time to think forward and few years and ask “what’s the roll-up?” or what’s the great simplifier to unleash a lot of this value that’s been created through new database technology.

A Modern Analytics Tool: Built for NoSQL, Yet Fully Backwards Compatible

Jeff
Yes, we built a solution that understands modern data, but one that also understands legacy data, relational data. That’s what other companies haven’t done. If you look at the market, what you’ve got today are legacy relational analytics vendors like the Tableaus and the Business Objects of the world, they are attempting to add support for more complex data structures like JSON and XML and things like that. They’re very early in that process and they’re not doing a very good job, frankly.

At the other end of the spectrum, you’ve got all these hyper-specialized tools like Impala for Hadoop. But in the end, this is the reality: There’s no world where you take Impala and it suddenly works on Oracle or it suddenly works on MongoDB. It doesn’t. That will never happen or not in any future that’s even visible.

The exceptional thing that we did was to create an evolutionary approach to this problem.

The exceptional thing that we did was to create an evolutionary approach to this problem. Not another silo or another one-off tool. We said, we’re going to create something that can work with all the modern data structures that people are encountering like JSON, like XML, and other sorts of semi-structured data, but that also has the ability to be completely compatible with their existing legacy sources, the relational world.

That’s the piece that’s magic.

Chris
Okay, so let’s talk about what impact this has on business. Let’s say you’re a CIO, you acknowledge polyglot, then you’ve got this killer complexifying thing where you have all these tools to do analytics. You say, okay I’m gonna go with SlamData, I’m gonna have this one query language and visualization platform… what’s the magnitude of that change for a company?

An Analytics Lingua Franca: Clear Value for Companies, the Industry, Too

Jeff
The opportunity is the same opportunity that companies that adopted BI standards and even Tableau as a standard, ten or fifteen or five years ago, only bigger because we have exponentially more data. The reason a company would adopt Business Objects as their standard for BI is because it makes sense. The training curve was shorter. It worked. You had a standard implementation of Business Objects that everybody in the company used to analyze data. People learned the tool, and they used the tool, and you could manage the tool, and it was a very consistent experience.

Companies want that again, but they need it over more kinds of data and more amounts of data. People didn’t talk about big data a decade ago. They didn’t. That term didn’t exist.

That’s the opportunity — it’s that you’ll have this consistent powerful tool that will allow you to query data of any kind in your organization. It’s a single learning curve. It’s less infrastructure to manage. It makes sense. And it works.

Think about the infrastructure requirements these guys have today. You’ve got their legacy tools whether that’s a Tableau or a Cognos or Business Objects, which are not trivial to manage. If they’ve got some sort of Hadoop specific tool, there’s usually a fair amount of work going into that. They might have people that are doing custom development to try to generate reports from NoSQL sources like a Couchbase or a Mongo. They may have invested six figures in ETL software to do nothing more than move data back and forth between these different sources so they can try to use the tool. How much money are they spending on Talend or Informatica to move all of their NoSQL or Hadoop data over to Oracle, because that’s what Tableau works on?

Post MongoDB: Spark, Couchbase, MarkLogic, Postgres, Federated Queries, RDBMS

Chris
What’s on the road map for 2016 that builds out this new landscape for analytics?

Jeff
Up next, Spark for Hadoop, Couchbase, MarkLogic, Postgres — those are all arriving in the next few months. In 2017, we will keep adding NoSQL databases but also focus additional RDBMS databases: Oracle, MySQL and others.

Chris
What is novel about the way SlamData adds new connectors? The so-called QScript technology.

Jeff
The way to describe it is that SlamData is the universal engine for computing analytics on any structure of data and all it needs is the ability to talk to a data source to do what it does. You can create that connection using QScript and adding a new database can be done in a matter of weeks.

Chris
Can you define Q script?

Get the Queries Right And…

Jeff
QScript is basically an intermediate layer that sits between the Quasar engine and the target data source that allows us to efficiently execute the queries against the target data source. It’s a standardized approach. Keep in mind that QScript, what it is not, is a smart ETL layer. Meaning it’s not trying to extract data and map it into the Quasar engine. That’s essentially what existing tools try to do. It’s doing the opposite. It’s actually compiling and executing the queries against the target data source in situ. That’s a very different way of solving this problem that has not been done.

That’s one of the things, when people hear intermediate layer they think, oh it’s just some sort of mapping engine that’s just sucking data out of whatever the target is and mapping into your data engine. That’s how Tableau works. The reason I think Tableau has a huge challenge on their hands over the next few years is that everything they do is predicated on this notion of the data engine and the data engine is basically a database that lives inside of Tableau on your desktop. All data goes there. All. So when they say, oh we work on, insert NoSQL database here, what they really mean is they will allow you to extract some data from that NoSQL database and they will map it into their little data engine, which is in it. They call it the data engine. It’s just a database embedded inside their product, but the problem is that doesn’t work when you’ve got mountains of complex data.

Chris Dima

Entrepreneur and technologist. Always making something out of nothing.

All author posts