The term Big Data refers to huge and complex data sets that are very difficult to work with. Even though there is no exact definition of big data it usually involves petabytes and exabytes of data processed by really large companies or organizations. Such data require large server capacities which is why they are often associated with cloud solutions.
It is reported that 90% of the current data share in the world were created in the last two years. What’s more, it is predicted that this year alone will generate around 1,200 exabytes of new data. However, it is not only volume that counts– velocity and variety are said to be equally important big data measures. These three Vs were introduced by Doug Laney in his 2001 report. Variety refers to the type of a file such as video, audio, text, blog. This is important because all these kinds of files are managed differently. Velocity represents the speed of new data creation and modification. The data have to be analyzed as soon as they appear or change.
Big data and multimedia
Big data are commonly associated with the explosion of social networks and the rise of multimedia. Considering the fact that there are more than 900 million Facebook users, each with a profile loaded with images, statuses, notes and videos, it is not difficult to guess that Facebook is dealing with huge amounts of data. But Facebook is just one of the giants and the trend of data growth is still on the rise. This is why Facebook and other companies need more storage space. Those who can afford it build their own data centers while others just rent the cloud space. Besides providing excellent and relatively cheap storage space, cloud computing enables faster processing of exabytes of unstructured data which is impossible with traditional relational database management system (RDMS). The key term here is unstructured, as Big Data may refer to various kinds of files – tweets, facebook statuses, web pages, etc.
Cloud solutions for Big Data
Before the rise of cloud computing and IaaS, Big Data transfer and storing required large hardware investments. Now, it is possible to get the cheap resources and this why cloud computing and Big Data are so often mentioned in relation to one another. One of the cloud solutions almost exclusively related to big data is Hadoop – an open source software framework that supports large data sets processing across multiple computers. Besides Hadoop, there are two other classes of technologies suited for managing Big Data – NoSQL and Massively Parallel processing.
The problem of bandwidth and transfer
Even with the processing and storage solutions it is often necessary to transfer files from one place to another. For moving petabytes of data the network needs to be very good so the data would be delivered at a proper speed. Aspera has built high-speed file-transport software (fasp) which guarantees a fast and secure file transport regardless of the distance between end-points or network conditions.
With all these technical solutions it seems that big data is not such a big issue after all. However, big data now grows very fast and become more complex or unstructured. Development of cloud computing offered a potential solution to the issues like data storage and transfer. Now it is necessary for a cloud technology to develop at a similar rate, so the processing and storage systems would always be on a required level.