Blogs

Meaningless data - your one way ticket to disaster!!

by Vijayanathan Naganathan


        In this blog we will take a look at meaningful data and its dominant relevance to enterprises today.


       Right from executive dashboards to simplest of web applications, nothing moves without data. In a world where more and more decisions are being taken based on data and information, it is absolutely important to acknowledge the value of meaningful data. If there is no data, then the world would be at standstill. Right from sales teams to quality to training, everyone needs just not data, but right meaningful data. Today's business requirements, application architectures along with advancement in technology exponentially increases the complexity of applications. Data is one of the key factors that is giving nightmares to enterprises, leaving behind the below impression:

[A] Poor application quality perceived due to poor quality of data being used.

[B] Project deadlines are missed in many instances as data is not readily available. Cost of testing is increasing as there needs to be a separate exercise for large data provisioning / synthetic data generation.

[C] Diverse data across development, integration testing, performance testing, acceptance testing, demo for end users and training for end users that enterprises need to cater for different objectives.


      According to the World Quality Report 2013-2014, "About 40% of testing budgets are allocated to environments that includes hardware and tools. Organizations are finding it difficult to provide test data that is representative, consistent and sufficiently comprehensive for today’s complex and multi-vendor application landscape. The lack of the appropriate test platform and test data can quickly erase efficiency gains realized elsewhere, such as from investing in structured testing processes or automation tools. Analysis of the test data generation methods reveals that organizations prefer to create new test data including ones 'created on the go, as opposed to reusing existing/production datasets. Majority of organizations find it difficult to synchronize the right sets of test data with versions of applications under test, especially in test scenarios requiring frequent repetitions and multiple test environments."

      The above point clearly sends out the message "Significant investment in Test Environments and Test Data are undermined by a lack of specialist expertise." Traditional data offerings hover around three strategies


1. Poor application quality perceived due to poor quality of data being used.

2. Data Population via Stored Procedures & Scripts and

3. Data Population via GUI.


      By and large we have seen the former two strategies are at play. Production extraction, subsetting and anonymization exercise leads to issues around access to sensitive production data.


      The key challenge that comes up when we look to leverage existing production data is data privacy. Getting access to production is asking for mayhem with regulations and legal entities, if not properly assessed. Often enterprises do not take this route as there is larger risk of any non-compliance, that in turn would lead to damage to brand name and business. Synthetic data creation is by and large the easier route opted for enterprises, as there is no risk involved and the customized data can be generated quickly. We would still say that the data intelligence is being a sore missed point either due to lack of enterprise level vision or lack of experience or lack of unilateral thinking. But all said and done, the two strategies are not fool proof.


      The need of the hour for enterprises would be to bring in comprehensive intelligence in building high quality data, instil higher confidence to business with zero risk compliance to regulations, deliver efficiencies in data provisioning, drive in large savings through reduced cycle times while at the same time delivering high costs savings for customer. From our interactions with various clients, we strongly see the need to revitalize data provisioning engagements with business intelligence in order to be able to elicit detailed data requirements from nowhere, toolkits to provision data either through synthetic data route or production extraction route.


       Today's enterprises’ top data cravings are: high quality data, data confidentiality, data privacy, data integrity, data intelligence, realistic data distribution and data anonymization. Let us take a quick look as to how address these compelling data cravings. We are sharing some of our thoughts on addressing these market needs.


Data collection, analysis & intelligence - Eliciting detailed data requirements form business requirements, discussions with business teams on business scenarios, resources with domain knowledge will help in the data collection. Project teams having personnel with domain knowledge coupled with an ability to think through what if scenarios can help bring intelligence in data collection exercise. This in turn will help ensure high quality of data in use.

Data engineering - Once data requirements are collected and an appropriate data management strategy is in place, focus is on data engineering. Data engineering is an exercise of using existing data or generating data thereby to understand the overall business process. Data engineering exercise can help in better understanding of the business process as well as improving the overall data quality.

Data mining - Wherever there instances of existing production databases of the application or a system of similar nature, the exercise of extracting information from a data set, transforming it into an understandable structure for further use is data mining. Data mining exercises are very helpful in interpreting business rules, identifying business rules, identifying patterns of data groups, identifying anomalies and in classification of data. This exercise helps in achieving complete data quality.

Data cleansing - At times databases have inconsistent data, which if used, can lead to inaccurate business operations. Now, to set this aright, it needs an exercise of cleansing which involves identifying the inconsistent data, and having it replaced, modified or removed. The cleansed data helps brings back consistent data into the databases and guides in more meaningful business validations.

Confidentiality, privacy and anonymization - Many enterprises have been plagued with law suits due to ignorance of confidentiality and privacy of customer data. Confidentiality helps limit unauthorized access or disclosure of data to unintended requestors by leveraging authentication mechanisms, access control systems, etc. Data privacy helps ensure against disclosure of personally identifiable information or other sensitive information (like bank account numbers, social security numbers, health records, residential/geographical records), as there are very string regulations and compliance acts in place that need to be adhered. Wherever personally identifiable customer data needs to be accessed by support personnel of enterprises, it needs to be anonymized across multiple instance- right from prod to Dev to QA to training.

Data governance - Managing data has multiple aspects to it -processes, people, roles & responsibilities, tools, technologies, policies, quality -all of which need a convergence, that can help the enterprise not only exercise control over its data but also to give adequate confidence that things are in place. This put in lot of accountability in the usage of data within the enterprise, in turn making the enterprise to be more efficient.

      This is what we believe we need to put in place to address customer needs in terms of data. Is this exhaustive? Is there something else that you have seen in your experiences which have not got listed above. If yes, we would like to hear from you on those perspectives.

Top