Skip to main content

Posts

Showing posts from December, 2015

BIG Data, Hadoop – Chapter 4 - Hadoop Daemons

The back end components of Hadoop system can be visualized as shown below. Name Node and Data Node will be explained in detail in my next blog. All these Daemons are nothing but a piece of code. Java code is running at the background. In order to run Java Code, we need JVM, So each daemon service need some JVM service to run. Job Tracker- Any operation can be considered as a Job, example Read a text file is a job. This is handled by Job Tracker. Task tracker- A job can have many tasks. Like connection to file is one of the task, Reading the data is other task, displaying/processing the data is another task. These are managed by Task Tracker.

BIG Data, Hadoop – Chapter 3 - Hadoop Eco Systems

Pictorial Representation of Hadoop Eco Systems is as shown below. YARN system are not present in first generation of Hadoop development. (Hadoop 1.x versions). Remember, we do not have Yarn Cluster Resource Management System in Hadoop 1.x version which was a disadvantage as any other operations on HDFS, has to be converted to MR code (Map-Reduce Algorithm) and then it use to process the data. With help of YARN (Yet Another Resource Negotiator) in place, we can process HDFS files directly without converting it to into MR code, with the help of some additional languages such as Spark, Giraffe etc.,

BIG Data, Hadoop – Chapter 2 - Data Life Cycle

Data Life Cycle The data life cycle is pictorial defined as show below:     As we see, in our current system, we capture/ Extract our data, then we store it and later we process for reporting and analytics. But in case of big data, the problem lies in storing and then processing it faster. Hence Hadoop takes this portion, where it stores the data in effective format (Hadoop distributed File System) and also process using its engine (Map Reduce Engine). Since Map Reduce engine or Hadoop engine need data on HDFS format to process, We have favorable tools available in market to do this operation. As an example, Scoop is a tool which converts RDBMS to HDFS. Likewise we have SAP BOD to convert sap system data to HDFS.

BIG Data, Hadoop – Chapter 1 - Understanding Big Data & Hadoop

Understanding Big Data We all in recent time, came across the word ‘Big Data’. So the question is what exactly is Big Data? How much TB or GB or data is called a Big Data? Well, there is no standard size definition for Big Data. If current system when not able to handle the data, then, we call such data as Big Data. (Big Data is just a terminology used in IT) As an example, if I take a text file of 50 GB, Processing a text file of 50 GB size on our Laptop or computer is not a huge task but if we take a smart phone, processing 10 GB of data is huge task. That means, for mobile phone, that 50 GB of data is Big Data. Understanding Hadoop Our current systems such as ETL tools, reporting tools, programming environment all have capability of handling few petabyte of Data. And the growth of data annually is shown below in chart And also the growth of unstructured, Semi structured data are increasingly every day. So there is a need of more adv

Comma Separated Values and Group the Data- SQL

Hi All, Recently got into a situation of grouping set of data with a comma delimited. Here is the requirement. I have a table like this below,     And the way data should be shown as So let us create a sample data set to achieve this: create table #t1 ( ID INT , Name Varchar ( 10 )) Insert into #t1 Select 1 , 'a' Union Select 1 , 'b' Union Select 1 , 'c' Union Select 2 , 'e' Union Select 2 , 'f' Union Select 2 , 'a' Union Select 2 , 'H' Union Select 3 , 'X' And query for the output SELECT   ID        , STUFF (( SELECT ', ' + CAST ( Name  AS VARCHAR ( 10 ))          FROM #t1          WHERE ID = t . ID          FOR XML PATH ( '' ), TYPE ). value ( '.' , 'NVARCHAR(MAX)' ), 1 , 2 , ' ' ) Merge_output FROM #t1 t GROUP BY ID

SSIS Excel Error: Unexpected error from external database driver () when importing data from excel

Recently I had copied a excel file for analysis from SFTP site to development server from my account and asked my team mate to analyze it, by loading it into table. He logged in to the server and used a simple data flow task in SSIS with source as Excel to load the data. When he was connecting excel using excel connection manager, he had a weird error. Unexpected error from external database driver ()  He was unsure of why he was getting the error when selecting the sheet names. Then when reported, we quickly got into the possible reason of security. As I copied the file from SFTP to Development machine using my account, it had limited the access to other users. When he was trying to load, he was facing the error because it was a read-only file for him. Soon I granted full permission to him on excel security and it worked for him. There are various different solution on internet but initially none worked for us when we were trying to resolve. Hope this soluti