4 Commonsense Strategies for Data Testing

By Syed Haider September 05, 2012, 11:12 a.m. EDT 4 Min Read

Without thorough testing, databases often have hidden problems that eventually give insurers big headaches. In a previous blog, I laid out the reasons that data applications don’t get same level of testing as software. Here are some steps IT managers and data architects can take to remedy that situation.

1. Get smart testers

IT leadership should be wary of hiring "black-box" testers for data-projects whose core strength is in comparing the output of a batch process or report against a given spec or legacy report. While such mechanical analysis is inevitably needed, we should still look for testers who have more advanced skills, such as:

• White-box testing. This requires the ability to understand complex SQL code—keep in mind that the SQL and its variants (PL/SQL, T-SQL, etc.) are notoriously hard to document—and to create test data and scenarios that go above and beyond the “happy” path.

• Ability to independently write code against the specification that should match the output from the code being tested. This puts the testers in pretty much the same league as the developers. The importance of this core competence cannot be overstated for IT managers because it makes the task of organizing the testing group all the more complex.

• Data analysis. In BI projects, data analysis will really boost the end-users’ confidence in IT if the testers are able to perform rudimentary data analysis, such as basic trending, before they get to use the data.

2. Invest in configuration management

Since configuration management is still not as cut and dried in data projects as it is in the application development arena, it merits special attention upfront. One shouldn’t be deterred by the general lack of built-in features like revision management in most modern database platforms, as there are some decent third-party tools emerging in this space.

3. Automate routine testing

The idea here is to recycle your test-cases into automated “smoke-tests” that can be incorporated into production jobs. This way, we can eliminate the need for manual testing of routine production processes, such as incoming/outgoing feeds and batch processes.

It’s a win-win situation for all because smart testers wouldn’t typically enjoy the mundanity of such testing. Smoke tests can be written at a purely database level and don’t require any specialized tool to execute. All we really need is some sort of a framework that orchestrates these tests. Consider the following scenarios:

• As an incoming file is loaded into staging, the ETL invokes automated tests to do basic balancing of control totals against the loaded data. If it fails, an email goes out to the IT admins reporting a problem in the feed.

• As a number-crunching process finishes, it kicks off an automated test that runs a number of sniff tests using ‘acceptable’ thresholds (as specified by the business), and reports if it finds any data failing to meet them.

As we can see, it doesn’t matter which ETL or process-automation tool we employ. As long as we have the following broad-based features available in the testing framework, we can automate our testing. This will:

• enable developers/testers to define, activate/deactivate test cases and suites;

• enable a variety of test types: boundary cases, control totals, expected outcome, negative tests, exception scenarios, user-defined, etc.;

• withstand failures caused by incorrectly written or obsolete tests;

• communicate the results using some collaboration mechanism such as e-mail.

4. Focus on regression testing

As new features and improvements get introduced into the database, consider regression testing. The best way to avoid last-minute surprises is to keep your regression-test suite up-to-date as development continues. The main decision we often have to make in this regard is whether to kick off a full regression every time or make it more targeted.

The answer lies in our ability to track dependencies. We cannot fully count on relational database management system platforms’ innate ability to report dependencies. Often, a broken dependency is reported by the end users in the most unexpected of places after we have promoted some new features.

There are some third-party solutions to track dependencies more effectively from outside the database, and if they can be employed, we can run a partial regression test on potentially affected objects. In the absence of such solutions, our best bet is to run a full regression.

The above list of mitigating approaches is by no means exhaustive, but it does try to capture the big-ticket items. Let’s hope that some of these mitigating strategies might become irrelevant in future as new developments make life simpler for database developers and testers.

Syed Haider is an architect with X by 2, a technology company in Farmington Hills, Mich., specializing in software and data architecture and transformation projects for the insurance industry.

Readers are encouraged to respond to Syed using the “Add Your Comments” box below. He can also be reached at shaider@xby2.com.

This blog was exclusively written for Insurance Networking News. It may not be reposted or reused without permission from Insurance Networking News.

The opinions of bloggers on www.insurancenetworking.com do not necessarily reflect those of Insurance Networking News.

Syed Haider