If you are remotely connected to the world of enterprise tech and you haven’t heard of big data and what it entails, you must have been living under a rock, a fairly huge one! Simply put, big data was born because of all the structured and unstructured data that the planet is churning out every single minute. All this data has made big data the epicentre of technology development, solutions and consequently, careers!
The next obvious question then is what is big data analytics? Take a simple example. If you are in the business of selling shoes, a fundamental question you’d ask is; “Is there a growing trend behind a product? Or, is there a declining trend? Will a customer buy ‘Stockings’ when he purchases ‘Shoes’? These are business problem-solving questions. And the answer lies in data, loads of it. The answer lies in big data. Big Data can help you analyse trends and customer behaviour. It can help you arrive at decisive plan of actions that you know will work.
- Big Data plays a vital part in CERN, home of the large Hadron Supercollider.
- It collects unbelievable amount of data by taking 40 million pictures per second from its 100-megapixel cameras, which gives out 1 petabyte of data per second.
- The data from these cameras needs to be analysed. The lab is experimenting with ways to place more data from its experiments in both relational databases and data stores based on NoSQL technologies, such as Hadoop and Dynamo in Amazon’s S3’s cloud storage service.
So, is Big Data completely about Analytics? Not completely, but Analytics is the Ultimate Prize.
Other major streams in Big Data are Storage and Management. This is where you as a professional can contribute. Out of the lot, these two job profiles stand out from the rest:
i. Big Data Engineer
Big Data Engineers develop, maintain, test and evaluate big data solutions within organisations. Most of the time they are also involved in the design of big data solutions, because of the experience they have with Hadoop based technologies such as MapReduce, Hive MongoDB or Cassandra. A big data engineer builds large-scale data processing systems, is an expert in data warehousing solutions and should be able to work with the latest (NoSQL) database technologies.
ii. Big Data Solutions Architect
Big Data Solutions Architecture is an architecture domain that aims to address specific big data problems and requirements. Big data solutions architects are trained to describe the structure and behaviour of a big data solution and how that big data solution can be delivered using big data technology such as Hadoop. He or she needs to have hands-on experience with Hadoop applications (e.g. administration, configuration management, monitoring, debugging, and performance tuning).
Other profiles like – Big Data Researcher, Data Warehouse Manager, Data Warehouse Analyst, Data Analyst, Chief Data Officer etc etc are all big in the job market.
Hadoop – Big Data’s partner in crime
So, where is big data stored? Not in an Excel sheet. We obviously need something bigger. Say hello to Hadoop.
HADOOP is a product of APACHE Foundation. Apache is an American non-profit organization which supports the development of open-source software.
Hadoop is defined as an open-source Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment.
So, what can Hadoop do that others can’t? Process and understand unstructured data! Structured data which is in tabular format or otherwise can be easily dealt with. Excel can do it, and so can any other RDBMS.
But when readability reduces, and data is unstructured, that is where Big Data tools like Hadoop have a competitive edge.
Now that we are done with the tech behind big data and Hadoop, let’s get to know the rest of the gang (Hint: Here’s where the monies lie. Each of the tech you’re going to hear next can carve your big data career):
- For writing simpler Java codes, you can use PIG which is a platform, used to analyse large data sets representing them as data flows.
- If you want to run SQL-like queries on Big Data, then HIVEcan be used. Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query and analysis.
- If you want to use data stored in a NoSQL database, thenHBase can be used.
- For performing analytics in real-time, you can use SPARK.
These are Big Data tools, which go hand-in-hand with Hadoop, yet they do not replace Hadoop whatsoever. They are Hadoop Add-ons for Big Data.
Big Data is a treasure trove of opportunities. Dive into this new-age technology and learn Big Data to future-proof your career. Have a happy big data career!