Everyone is talking about ‘big data’, a critical element in the shift towards digitalisation. But what exactly is it, what does it mean for business and how can software assist organisations to overcome big data challenges so that they do not become overwhelmed by its sheer scale and complexity?

By Modeen Malick, senior systems engineer: MESAT at Commvault

Big data for business

As the business embarks on, or evolves its digital transformation efforts, more and more technology come into play. The Internet of Things (IoT) with its sensors and smart devices, cloud-based applications, robotics, automation, and other emerging technologies, are all creating massive amounts of data over and above a business’s traditional data.

Data increases and when analysed, it delivers invaluable business insights at a granular level. These findings help the business to define its next operational or sales models, transform processes, direct target markets and develop products and solutions which cater specifically to a market need. In other words, Big Data helps businesses to quantify their digital efforts.

How is it different from ‘regular’ data?

Big data differs from traditional data in that it is based on a distributed cluster which brings supercomputing levels of compute power to the data and is resilient to local failures. It uncovers the ever-changing relationships between data from multiple individual data sources. Due to the compute power and resilience required to analyse and interpret big data, it is not very storage efficient.

Big data sources include operational data coming from transactional systems, streaming data and sensors; ‘dark data’ that businesses already own but aren’t actively using; commercial data; social media data; and public data which is available from government and independent open data initiatives on a multitude of topics which may be relevant to the business.

When big data projects go into production, they become similar to conventional applications in that they need service levels. This is where software comes into play.

Big data challenges

The most obvious challenge businesses face with their big data projects is the sheer volume of data. Traditional data protection solutions that struggle to scan and process regular data fast enough will struggle even more to manage the quantities of Big Data at a fast-enough rate to be effective. The scale, complexity and completely varied nature of big data makes it tricky to manage.

Big data is also relatively new somewhat immature. As a result, many native data tools require manual customisation and scripting, which can be a counterproductively long and tedious process.

While big data comes with standard resilience built in, it is unable to protect itself against logical and user errors. Hardware resilience does not automatically allow data to revert to an earlier point in time where it was logically consistent, therefore potential for failure exists.

Finally, it can be complicated to manage big data in terms of capacity. Big Data arrives in big surges which businesses need to be prepared for and have capacity for.

Simplifying big data

Software can simplify big data management for businesses and eradicate most of these challenges. It can apply policies to newly created data that can run globally across a business’s entire infrastructure, detecting any changes to the environment, making data management much simpler.

Key functions of software include indexing, collection and recovery which can integrate directly with an application via an application programming interface (API), simplifying the process as it engages the big data name node to identify and understand each new data set.

Big data management software also helps to manage performance. Backups take full advantage of the distributed architecture using load balancing. Once multiple media agents have been deployed, the software will intelligently use the fastest nodes and switch over to another one if one fails.

Intelligent big data

It’s important to choose intelligent software which understands data structure and reacts accordingly, enabling fast data restoration when necessary. A fast and intelligent way to do restores with very large data sets is to provide native access to the data. This delivers immediate availability, avoiding the need to perform a restore. It also means limited workloads can be run immediately.

Software which integrates seamlessly with the cloud allows for the provisioning of both data and compute. This enables cloud-based data recovery and cloud-based development and testing operational use cases with pre-built workflows.

Big data performance and protection

Bottlenecks are common with big data and need to be avoided to ensure good data performance. Built-in software-defined storage (SDS) ensures data can be received and de-duped at massive scale, eliminating bottlenecks. The source can be scaled up by adding multiple media agents to increase parallel streams, which ensures parallel I/O all the way through.

Businesses should opt for a big data management software solution which leverages distributed architecture for resilience, continuing to operate in the event either the drive or node levels fail.

Big data control

Effective big data software is capable of controlling the entire data lifecycle and leverages policy driven services, such as migrating and tiering the data to better manage production environments. This is vital with big data as growth can be both rapid and unpredictable.

Uncontrolled growth can also be very costly in big data environments as at some point, a facility’s capacity such as power and cooling come under pressure – and those can be difficult and costly to resolve.

Overarching software

Big data is one of the most important elements of digital business and will be a major contributor to the future competitiveness of many companies. As such, big data deserves a software solution that helps overcome common challenges, adds intelligence and makes it far easier to protect and control.