Master Data Management is the South African partner for Datameer has just announced new data governance capabilities for its native Hadoop environment. As a pioneer in big data analytics, Datameer is helping solidify Hadoop as a mature and transparent platform for production-ready, mission-critical and regulatory-compliant analytics uses.
“Our customers need solutions that can be easily deployed, to deliver insight at business speed, without exposure to governance risks such as inaccurate reporting or data breaches. We partnered with Datameer due to their big data insights platform going beyond simple business intelligence (BI) by providing comprehensive management tools to make Hadoop enterprise ready. This announcement enhances that position,” says Gary Allemann, MD of Master Data Management.
While big data analytics is enabling significant new uses, it’s also becoming increasingly complex. Analysts and administrators alike need an easier way to navigate and manage data pipelines that have been developed by multiple departments and participants, and involve multiple data sources.
Some users may incorporate sensitive data sets, such as personal information, sensitive transactional data, or payments data (PCI). Big data is increasingly being used to support the complex integration demands of legislation such as BCBS 239 and Dodds Frank. Data consistency and audit trails from source to target, data security and privacy, and data retention and archiving are some of the many “must have” capabilities that Datameer provides natively in Hadoop.
Users no longer have to choose between a robust, governable data regime often associated with traditional data warehouse implementations and the ease-of-use of a self-service Hadoop platform. Now, by adding Datameer’s premium module, businesses have complete transparency into their data pipelines, and can provide IT with the appropriate tools to audit diligently for compliance with internal and external regulations.
“The world of big data, which includes Hadoop, needs to take data governance more seriously in order to become ready for enterprise-grade deployments,” says John L. Myers, managing research director of BI at Enterprise Management Associates. “As more technologies join next-generation data management environments, open architectures such as Datameer’s are going to be critical in meeting both internal and external data governance requirements to make those solutions enterprise ready.”
“Hadoop has been seen as the Wild West in which vendors have been developing different products for the ecosystem without really thinking about data governance and sophisticated security protocols,” says Stefan Groschupf, CEO of Datameer. “With these new features, we’re driving home the point that we’re serious about helping enterprises transform their business into data-driven organisations.”
Quality and consistency
Data quality and consistency are imperative when it comes to ultimately extracting value from big data. If at any point in the data pipeline there is a question about data validity, the overall value of the resulting insights is in question. Datameer’s data profiling tools enable you to check and remediate issues like dirty, inconsistent or invalid data at any stage in a complex analytics pipeline, and provides transparency into every change, from the original dataset all the way through to the final visualisation. Datameer’s capabilities include data profiling, data statistics monitoring, metadata management and impact analysis.
Data policies and standards
Data access policies are the first line of defence against risk for businesses. For IT, the goal is to implement policies that allow them to manage risk appropriately, while still meeting business needs. Specifically, Datameer supports secure data views and multi-stage analytics pipelines.
Data security and privacy
True big data security needs to exceed that of the Hadoop Distributed File System’s built-in capabilities. Datameer provides LDAP / Active Directory integration, role-based access control, permissions and sharing, integration with Apache Sentry 1.4 and column and row security/anonymisation functions.
Regulatory compliance
Across several industries, there are legal imperatives for big data governance such as Sarbanes Oxley, Basel, HIPAA and PCI compliance. Features like data lineage involve artifact/file level dependency graphs, dependencies REST API and worksheet lineage. Auditing functionality includes user action log and allows external systems to be apprised of user and system audit events as they happen.
Retention and archiving
In Datameer, flexible retention rules allow each imported data set’s retention policy to be configured by an individual set of rules. It is easy to configure Datameer to keep data permanently, or to purge records that are older than a specific time window. Independent of time, retention rules can also be configured based on the number of runs of ELT ingests or analytics workbook executions. Security rules allow retired data to be either instantly removed, retained until a specified time, or manually removed after system administrator approval.