      Spark :   map(func)   Returns a new distributed dataset, formed by passing each element of the source through a function func.     filter(func) val filterWord= sparkIn.filter( line.contains(“what”)) Returns a new dataset formed (more…)

Step:- 1 Copy hadoop installation files from local to virtual machine Command :- hadoop@hadoop-VirtualBox:~$ sudo cp -r /media/sf_Dee/ /home/hadoop/Desktop/ Step:- 2 Give permission to folder on Desktop Command :- hadoop@hadoop (more…)

A data warehouse is a repository (collection of resources that can be accessed to retrieve information) of an organization’s electronically stored data, designed to facilitate (more…)

Hadoop Interview Questions   Explain about yourself. What is the biggest challenge you have ever faced and how did you resolved it? What is the biggest challenge you faced (more…)

What is Hadoop? •The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using (more…)


HADOOP YARN: YARN-Yet Another Resource Negotiator. It’s the next-gen MapReduce. YARN is one of the key features in the second-generation Hadoop 2 version of the Apache Software (more…)

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) (more…)

BigData / Hadoop Interview Questions Are you looking out for Hadoop interview questions that are frequently asked by employers? There is given hadoop interview questions and answers (more…)