Business Intelligence and the New Analytical Databases

October 20, 2009, 1:03 p.m. EDT 8 Min Read

Business intelligence software has evolved to fit an environment dominated by relational databases. This mostly meant overcoming the inherent inefficiencies of executing analytical queries against normalized data models, typically by reorganizing the data into star schemas, cubes, indexes or other specialized structures. Sometimes the restructured data itself resides in a relational database, and sometimes it doesn't.

Because so many enterprises have invested heavily in these BI applications, new database products must be compatible with them. But products built specifically for analytical use (ParAccel, Vertica, Netezza, Sybase IQ, etc.) often don't face the constraints that the BI applications were designed to overcome. If you attach one of those engines to standard BI software, you get better performance but are not taking full advantage of the technology.

So let's imagine a BI system designed with analytical databases in mind. Probably the most important feature of these databases is that they are relatively insensitive to data structure. (Caveat: Each product is different; some are more sensitive to structure than others.) In many cases, the system can efficiently execute queries against data in its original table structure, without the denormalization, indexing and aggregation used by conventional BI tools. This offers tremendous savings in time and effort, since much of the work in conventional BI systems is designing and populating these alternative structures. What's worse, at least from an end-user viewpoint, is that this restructuring must be done by technical specialists. Any project must go through a time-consuming cycle of specification, development, review and modification as users struggle to understand their needs and the technicians struggle to understand the users.

But just because the analytical systems can query complex data structures doesn't mean that end users can understand those structures. Only a sadist would copy hundreds of database tables from an enterprise system and ask an end user to query them directly. To take advantage, a BI system needs a metadata layer to present the underlying tables in a way that makes sense to nonspecialists. Indeed, one of the best reasons to retain existing BI software is that at least some of the products already contain extensive facilities to decouple the underlying physical data structure from the structures that are presented to users. To the extent that existing BI systems are not dependent on particular physical structures, they can provide excellent models for features that make end users more productive.

Adding Databases

One area where existing BI systems won't provide much help is allowing end users to connect to new data sources. In the current systems, this is a task for technicians, so the interfaces are designed with technical users in mind. But a major advantage of the new databases is how easily they can integrate new data with existing tables. A BI system designed to take advantage of this would allow end users to do it for themselves. This implies easy features to physically load the data, explore it, define relationships between tables and set rules for how key values are matched.

Data exploration is a particularly important opportunity, because the analytical databases let end users examine the new inputs immediately. This means users can quickly judge the data in each field, uncovering good and bad surprises that might otherwise remain hidden until weeks of technical effort had been invested. A BI interface has to support this by providing appropriate data profiling and analysis tools. It must also support the next step, relationship definition, by making it simple to define trial matches between different tables and see how closely they correspond.

Value matching is particularly important as systems incorporate increasingly heterogeneous data sources. Exact matches between two columns are easy, but something as simple as a customer number stored as a numeric field in one system and a text field in another can cause trouble with conventional databases. Alternative date formats are another trivial difference that can still be difficult to overcome. Fuzzy matches, such as variant forms of names and addresses, are much more challenging but very important as users incorporate a wider range of inputs. In the conventional BI world, removing these discrepancies was another task handled by the technical experts as they set up the database. If the technical experts themselves are removed from the process, this matching must be handled automatically by the database or the BI system, or end users must have tools to manage it for themselves.

Let's add one more requirement. Although some analytical databases allow only a single logical data structure for each physical set of tables, most resemble SQL in letting users define table relationships within each query. Databases that allow this flexibility need interfaces that make it available to end users, either when they are formulating queries or when they are setting up the presentation layer.

All of these features support the fundamental goal of letting end users ask unanticipated questions without needing a technician to build a new cube, reorganize the data tables or expose details that were lost in a previous aggregation. Today's analytical databases make this possible, but new BI systems are needed for them to deliver.

It wasn't very hard to come up with the list of features for these next-generation BI systems because they already exist. The pioneers are vendors like QlikTech, illuminate Solutions, ADVIZOR Solutions, Tableau Software, TIBCO Spotfire, Lyzasoft, smartFOCUS and Alterian who did not originally constrain themselves to be compatible with SQL. Instead, they developed interfaces that let them take advantage of their proprietary database engines in ways that SQL wouldn't allow. Some of these vendors provide APIs to let external systems access their data, and some of these do support SQL queries. But SQL front ends cannot take full advantage of their database features.

The main advantage these systems have over conventional BI software is the power they provide to business users. Adding a new source or building a new report doesn't require a technician to modify the data structure or generate a new cube. Most of these products also provide at least some data preparation capability through a scripting language or process flow. This won't replace the integration needed to build an enterprise data warehouse, but it lets users outside of IT create applications that go well beyond simple reporting. Some technical skill is needed, but it's on par with an Excel power user, not a database architect.

Of the systems listed, only one can really be considered a major force in the industry. The other products have much smaller bases, even though several have been around for more than a decade. This raises an obvious question: if the systems are so great, why haven't they been more widely adopted?

One constraint is scalability. Many of these products originally ran on 32-bit architectures that limited the in-memory systems to 2GB of data and the disk-based systems to 2 billion rows. Load speeds were also originally measured in tens of gigabytes per hour. Those specs are more than adequate for many purposes and are probably larger than the data cubes underlying many conventional BI systems. Many of these constraints are now removed by 64-bit architectures and parallel processing, but handling the hundreds of terabytes in even a small enterprise data warehouse is still an issue. These systems must co-exist with a conventional database rather than replace it.

The more important reason is probably IT department resistance. IT departments are generally conservative when it comes to unfamiliar vendors. But, beyond that, these vendors' decision not to match SQL standards makes their databases seem dangerously isolated from the rest of the corporate infrastructure. Any data loaded into the systems can only be accessed via the vendors' own tools or APIs, which raises a red flag for IT departments wary of reliance on a single product. In addition, the unique technology of these tools means that special training is needed to use them - an investment that IT departments seek to avoid. The small installed base of these products also means that IT can't plan to hire experienced staff that someone else has trained already.

Benefits of Change

Of course, IT departments do sometimes embrace new technologies, particularly when they appear to solve a pressing problem for the IT department. But remember that the main advantage of these tools is that they empower business users to do more for themselves. In theory, this would be an advantage to IT departments, because it would reduce their workload or let them shift resources to other tasks. But in practice, if an IT department has already invested in conventional databases and BI applications, these systems offer few new capabilities. Asked to examine the new systems, IT people often honestly shrug, "I can already do that." Giving similar capabilities to other people doesn't strike them as particularly important. And, no one likes to make his or her existing skills obsolete.

As a result, adoption of these new tools has largely been restricted to situations where IT isn't involved or IT itself lacks an incumbent alternative. For the lower-cost systems, this often means they are purchased by business departments for their own use, outside the IT budget and with minimal IT support. Another sweet spot is small to midsized companies with few IT resources and little existing BI infrastructure. On the relatively rare occasions when a large IT shop purchases one of these tools directly, it is usually because its technology solves a particular problem that has proven impermeable to conventional methods.

The SQL-compatible analytical databases avoid most of these obstacles because they mesh more easily with existing infrastructure. But, precisely because they continue to rely on conventional, IT-centric BI applications, these systems offer fewer benefits to business users than the more radical alternatives. The vendors selling these systems and the businesspeople eager to use them share the challenge of convincing IT departments and senior management to approve them.

This article has been reprinted from Information Management.

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. Raab is also the author of the new book The Marketing Performance Measurement Toolkit (Racom Communications). He can be reached at draab@raabassociates.com.