Skip to main content

BIG Data, Hadoop – Chapter 1 - Understanding Big Data & Hadoop


Understanding Big Data


We all in recent time, came across the word ‘Big Data’. So the question is what exactly is Big Data? How much TB or GB or data is called a Big Data?

Well, there is no standard size definition for Big Data. If current system when not able to handle the data, then, we call such data as Big Data. (Big Data is just a terminology used in IT)

As an example, if I take a text file of 50 GB, Processing a text file of 50 GB size on our Laptop or computer is not a huge task but if we take a smart phone, processing 10 GB of data is huge task. That means, for mobile phone, that 50 GB of data is Big Data.

Understanding Hadoop

Our current systems such as ETL tools, reporting tools, programming environment all have capability of handling few petabyte of Data.

And the growth of data annually is shown below in chart





And also the growth of unstructured, Semi structured data are increasingly every day.



So there is a need of more advanced tool which are capable of processing and storing these data. The system is also needed to have the four V’s (Velocity, Variety, Volume, and Veracity)

Hadoop is one of such tool which helps us in handing BIG Data both to process and as well as Store it effectively. Hadoop can handle all the data types, such as Images, Videos, Database files, File Systems etc.



Comments

  1. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in MSBI, kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on in MSBI. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Nitesh Kumar
    MaxMunus
    E-mail: nitesh@maxmunus.com
    Skype id: nitesh_maxmunus
    Ph:(+91) 8553912023
    http://www.maxmunus.com/


    ReplyDelete
  2. Thank you your words! I wanted to write lot of my blogs but time is not sufficient. I will try to find some time to share my knowledge.

    -Dhinakaran

    ReplyDelete
  3. Thank you! I would like to share knowledge to help others to understand the concepts in simple terms.

    ReplyDelete


  4. Thank you for sharing the article. The data that you provided in the blog is informative and effective. The information which you have provided is very good. It is very useful who is looking for
    Best Devops Training Institute

    ReplyDelete
  5. Good post and informative. Thank you very much for sharing this good article, it was so good to read and useful to improve my knowledge as updated, keep blogging. Thank you for sharing any good knowledge and thanks for fantastic efforts.
    Salesforce Training in Chennai

    Salesforce Online Training in Chennai

    Salesforce Training in Bangalore

    Salesforce Training in Hyderabad

    Salesforce training in ameerpet

    Salesforce Training in Pune

    Salesforce Online Training

    Salesforce Training

    ReplyDelete
  6. Very nice Blog,keep sharing more information with us.
    thank you.....

    big data hadoop training

    hadoop admin training

    ReplyDelete

Post a Comment

Popular posts from this blog

Zip/Unzip multiple files and also include password for zipped file using SSIS

We have many scenario that we need to Zip many files which we come across and then so some operations like either sending it as a email or just moving zipped file to some other destinations etc. But we were using manual method to zip multiple files. In this post, I tried to create a package which will zip multiple files using SSIS. Here for Zipping files purpose, I'm using 7-ZIP which is free software available in google sites. Download files and install onto your system. First let me show how to Zip on file and later I will show how to zip multiple files using SSIS and 7Zip tool. Compressing Single file. Here I'm trying to Zip one single flat file which is of 40MB size. I kept this file in C:\Documents and Settings\\Desktop\test\source folder. Now to compress this file, I will open my SSIS and I'm dragging and dropping EXECUTE PROCESS TASK from Control Flow. Now right click on Execute Process task and go for edit and select Process option. In process tab,

SSIS: The Value Was Too Large To Fit In The Output Column

I had a SSIS package where I was calling a stored procedure in OLEDB Source and it was returning a “The Value Was Too Large to Fit in the Output Column” error. Well, My Datatype in OLEDB source was matching with my OLEDB Destination table. However, when I googled, we got solutions like to increase the output of OLEDB Source using Advanced Editor option . I was not at all comfortable with their solution as my source, destination and my intermediate transformation all are having same length and data type and I don’t want to change. Then I found that I was missing SET NOCOUNT ON option was missing in Stored Procedure. Once I added it, my data flow task ran successfully. 

How to move multiple files in ssis and also rename simultaneously

There are two ways to achieve this. 1) We can move the flat files and then rename it. 2) While moving files itself, automatic rename should be done. We will do the second type. The criteria is to rename the files while moving from source to destination. So for that, we need FILE SYSTEM TASK to be included. Secondly since we need to move many files, we will use FOR EACH LOOP CONTAINER. To fetch all the files, we can use FOR EACH LOOP task in SSIS. In collection tab, we can select FOREACH FILE enumerator option for fetching files and we can change enumerator configuration Folder option: Points to source where we need to fetch files. Files: will give us idea whether we need to fetch all the files (*.*) or if we give extension like *.txt, it is going to fetch only  .txt files . Once I give Source name in FOR EACH LOOP container, It is going to fetch all the files corresponding to that path. Retrieve file name: This option is used to let the variables mentioned in VARIA