Spark programming

      Spark :   map(func)   Returns a new distributed dataset, formed by passing each element of the source through a function func.     filter(func) val filterWord= sparkIn.filter( line.contains(“what”)) Returns a new dataset formed (more…)

Hadoop Installation Guide

Step:- 1 Copy hadoop installation files from local to virtual machine Command :- hadoop@hadoop-VirtualBox:~$ sudo cp -r /media/sf_Dee/ /home/hadoop/Desktop/ Step:- 2 Give permission to folder on Desktop Command :- hadoop@hadoop (more…)

Data Warehousing

A data warehouse is a repository (collection of resources that can be accessed to retrieve information) of an organization’s electronically stored data, designed to facilitate (more…)

Big Data Hadoop Interview Questions Part 2

Hadoop Interview Questions   Explain about yourself. What is the biggest challenge you have ever faced and how did you resolved it? What is the biggest challenge you faced (more…)

Hadoop & The Hadoop Distributed File System (HDFS)

What is Hadoop? •The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using (more…)


HADOOP YARN: YARN-Yet Another Resource Negotiator. It’s the next-gen MapReduce. YARN is one of the key features in the second-generation Hadoop 2 version of the Apache Software (more…)

Handout Mapreduce

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) (more…)

BigData / Hadoop Interview Questions

BigData / Hadoop Interview Questions Are you looking out for Hadoop interview questions that are frequently asked by employers? There is given hadoop interview questions and answers (more…)