Home »
Data Analytics »
Data Analytics MCQs
MCQs | Data Analytics – Preprocessing and Basics of Big Data
Data Analytics – Preprocessing and Basics of Big Data MCQs: This section contains the Multiple-Choice Questions & Answers on Data Analytics – Preprocessing and Basics of Big Data with explanations.
Submitted by IncludeHelp, on December 25, 2021
Data Analytics Preprocessing and Basics of Big Data MCQs
1. Unprocessed data or processed data are observations or measurements that can be expressed as text, numbers, or other types of media?
- True
- False
Answer: A) True
Explanation:
Data are observations or measurements (unprocessed or processed) represented as text, numbers, or multimedia. Information that has been transformed into a form that is more efficient for movement or processing is referred to as data in computing.
2. With reference to computing aspects ___ is a symbolic representation of facts or concepts from which information may be obtained with a reasonable degree of confidence.
- Program
- Knowledge
- Data
- Flowchart
Answer: A) Program
Explanation:
With reference to computing aspects data is a symbolic representation of facts or concepts from which information may be obtained with a reasonable degree of confidence.
3. Which of the following can be considered to be the primary source of unstructured data among the others?
- Facebook
- Twitter
- Internet webs
- All of the mentioned above
Answer: D) All of the mentioned above
Explanation:
Facebook, Twitter and Internet webs can be considered to be the primary source of unstructured data among the others.
4. Amongst which of the following is/are the examples of structured data -
- Videos
- Employee's name, employee's id, employee's age
- Audio files
- All of the mentioned above
Answer: B) Employee's name, employee's id, employee's age
Explanation:
Structured data is extremely particular and is recorded in a set format, whereas unstructured data is a mashup of many different forms of data that are all stored in their original formats, as opposed to structured data. In above question, Employees name, employee's id, employee's age is an example of structured data.
5. Amongst which of the following step is performed by data scientist after acquiring the data?
- Deletion
- Data Replication
- Data Integration
- Data Cleansing
Answer: D) Data Cleansing
Explanation:
after acquiring the data, data scientists perform Data Cleansing. Data cleansing is a critical step in preparing data for use in subsequent operations, whether in operational activities or in downstream analysis and reporting. It is most effectively accomplished with the use of data quality technologies. Depending on their purpose, these tools can perform a number of tasks ranging checking basic typographical errors to validating values against a known true reference set.
6. Quantitative data mainly deals with ______.
- Audio data
- Images data
- Numeric data
- Videos
Answer: C) Numeric data
Explanation:
Quantitative data mainly deals with Numeric data Quantitative data is defined as the value of data in the form of counts or numbers, where each data-set has a unique numerical value associated with it, and where each data-set has a unique numerical value associated with it.
7. Big Data is a term that refers to data that is both too massive and impossible to be stored in _____.
- Traditional databases
- Big Databases
- SQL Databases
- All of the mentioned above
Answer: A) Traditional databases
Explanation:
Big Data is a term that refers to data that is both too massive and impossible to be stored in Traditional databases. The quantities, letters, or symbols on which computer operations are done, which may be stored and conveyed in the form of electrical impulses and recorded on magnetic, optical, or mechanical storage media.
8. Big Data is a field dedicated to,
- Storage of large collections of data
- Processing
- Analysis
- All of the mentioned above
Answer: D) All of the mentioned above
Explanation:
Big Data is a field dedicated to Storage of large collections of data, Processing and Analysis. Big data is defined as data that is so massive, quick, or complicated that it is difficult or impossible to process it using traditional methods, as opposed to little data. Having access to and keeping massive amounts of data for the purpose of analytics has been around for quite some time. The concept of big data, on the other hand, gained traction in the early 2000s.
9. Data that is less than 10 GB in size can be considered to be a little amount of data.
- Small
- Medium
- Big
- All of the mentioned above
Answer: A) Small
Explanation:
Data that is less than 10 GB in size can be considered as a small data. Small data is data that is 'small' enough to be comprehended by a human being. It is information in a volume and manner that makes it easily accessible, instructive, and actionable for the intended audience.
10. Which of the following are benefits of Data Processing?
- Cost Reduction
- Time Reductions
- Smarter Business Decisions
- All of the mentioned above
Answer: D) All of the mentioned above
Explanation:
When data is collected and transformed into useful information, this is referred to as data processing. Data processing is typically undertaken by a data scientist or team of data scientists, and it is critical that it is done correctly in order to avoid having a negative impact on the final product, or data output.
Rather than starting with unstructured data in its raw form, data processing transforms information into a more understandable format (graphs, documents, etc.), providing it the form and context that are required for it to be processed by computers and used by personnel throughout an organization.
11. Which is the process of examining large and varied data sets?
- Machine learning
- Cloud computing
- Big data analytics
- All of the mentioned above
Answer: D) All of the mentioned above
Explanation:
Big data analytics is the process of examining large and varied data sets. In the context of big data analytics, the application of advanced analytic techniques to extremely large and heterogeneous big data sets that contain structured, semi-structured, and unstructured data, from a variety of sources, and in various sizes ranging from terabytes to zettabytes is described.
12. Data Identification → Data Acquisition & Filtering → Data Extraction → Data Validation & Cleansing, are the phases of?
- Data Analytics Lifecycle
- System Analysis and Design
- Software Development and Life Cycle
- None of the mentioned above
Answer: A) Data Analytics Lifecycle
Explanation:
Data Identification, Data Acquisition & Filtering, Data Extraction, Data Validation & Cleansing are the phases of Data Analytics Lifecycle. The Data Analytics Lifecycle is a diagram that depicts these steps for professionals that are involved in data analytics projects. The phases of the Data Analytics Lifecycle are organized in a systematic manner to build a Data Analytics Lifecycle. Each phase has its own significance as well as its own set of traits.
13. Hadoop is a framework that is free and open source.
- True
- False
Answer: A) True
Explanation:
Hadoop is an open-source platform. The Hadoop software library is a framework that enables for the distributed processing of massive data sets across clusters of computers using simple programming models. It is a component of the Apache Hadoop software library. It is intended to grow from a small number of servers to thousands of devices, each of which can do computing and storage on its own.
14. Hadoop File System is constantly required to deal with enormous amounts of data ____.
- Network
- Clusters
- Data sets
- None of the mentioned above
Answer: C) Data sets
Explanation:
Hadoop File System is constantly required to deal with enormous amounts of data sets. HDFS is a distributed file system that can handle big data volumes and is designed to run on low-cost commodity computing gear. It is used to grow a single Apache Hadoop cluster to hundreds (or even thousands) of nodes by using a distributed computing model. HDFS is one of the three key components of Apache Hadoop, the other two being MapReduce and YARN. HDFS is used to store and organize data.
15. Hadoop is a framework that is used to work with _____.
- MapReduce, Hive and HBase
- MapReduce, MySQL and Google Apps
- MapReduce, Hummer and Iguana
- MapReduce, Heron and Trumpet
Answer: A) MapReduce, Hive and HBase
Explanation:
Hadoop is a framework that is used to work with MapReduce, Hive and HBase. Hadoop is an open-source framework that can be used to store and process enormous datasets ranging in size from gigabytes to petabytes of data in a scalable and efficient manner. As opposed to employing a single huge computer to store and analyze all of the data, Hadoop enables for the clustering of numerous computers to analyze enormous datasets in parallel, allowing for faster analysis.
16. Amongst which of the following accurately describe Hadoop?
- Open-source
- Real-time
- Java-based
- Distributed computing approach
Answer: B) Real-time
Explanation:
Hadoop is a Real-time data processing framework. Hadoop was originally intended to be used for batch processing. That is, take a large dataset as input and analyze it all at the same time, then create a large output dataset. The very concept of MapReduce is geared toward batch processing rather than real-time processing. This was true from the beginning of Hadoop's existence; today, however, there are numerous options to use Hadoop in an even more real-time manner.
17. ___ has the world's largest Hadoop cluster.
- Apple
- Datamatics
- Facebook
- None of the mentioned above
Answer: C) Facebook
Explanation:
Facebook has the world’s largest Hadoop cluster.
18. Amongst which of the following is a correct statement?
- Machine learning emphasizes on prediction, based on well-known properties learned from the training data
- Data Cleaning emphasizes on prediction, based on well-known properties learned from the training data
- Both a and b
- None of the mentioned above
Answer: A) Machine learning emphasizes on prediction, based on well-known properties learned from the training data
Explanation:
Machine learning emphasizes on prediction, based on well-known properties learned from the training data. Machine learning is the study of computer algorithms that can improve themselves automatically as a result of their experience and the usage of data collected from various sources. It is considered to be a component of AI. Machine learning algorithms create a model based on sample data, known as training data, in order to make predictions or choices without being explicitly taught to do so. They can accomplish this without being explicitly coded.
19. Which of the characteristics of big data is, in terms of importance, more concerned with data science?
- Variety
- Velocity
- Volume
- None of the mentioned above
Answer: A) Variety
Explanation:
Variety in data is a main characteristic of big data which is more concerned with data science.
20. In which of the following areas do information management firms specialize in analytical capabilities?
- Stream Computing
- Content Management
- Information Integration
- All of the mentioned above
Answer: D) All of the mentioned above
Explanation:
Stream Computing, Content Management and Information Integration are the areas in which information management firms specialize in analytical capabilities.
21. The use of reporting and visualization features in Data Analytics refers,
- Processing of data
- User friendly representation
- Both A and B
- None of the mentioned above
Answer: C) Both A and B
Explanation:
The use of reporting and visualization features in Data Analytics refers to the processing of data and User-friendly representation. The graphical display of information and data is referred to as data visualization. Data visualization tools, which make use of visual components like as charts, graphs, and maps, make it easier to detect and analyze trends, outliers, and patterns in large amounts of information.
22. BI stands for ____.
- Business Information
- Business Initiation
- Business Intelligence
- Business Insider
Answer: C) Business Intelligence
Explanation:
BI stands for Business Intelligence. Business Intelligence (BI) is concerned with complicated techniques and technology that assist end-users in analyzing data and performing decision-making activities in order to expand their businesses. Business intelligence is essential in the management of business data and the management of performance.
23. The primary introduction of Power BI was dependent on,
- Microsoft Word
- Microsoft Excel
- Microsoft Outlook
- Microsoft PowerPoint
Answer: B) Microsoft Excel
Explanation:
The primary introduction of Power BI was dependent on Microsoft Excel. It is possible to consolidate self-service and enterprise data into a single view with Power BI, even when the data comes from multiple sources.
24. To consolidate inquiries in Power BI, what method do you employ?
- Join Queries
- Union Queries
- Both A & B
- None of the above
Answer: A) Join Queries
Explanation:
To consolidate inquiries in Power BI, Join Queries method employ. When we combine data, we connect to two or more data sources, shape them as needed, and then consolidate them into a relevant query for the end user. The Power Query Editor in Power BI Desktop makes extensive use of the right-click menus as well as the Transform ribbon to perform complex transformations. The majority of the options available through the ribbon can also be accessed by right-clicking an object on the ribbon, such as a column, and selecting from the menu that appears.
25. What is the most effective method of preparing your data for Power BI?
- User of a star schema
- Load all tables
- Include multiple objects
- None of the above
Answer: A) User of a star schema
Explanation:
The most effective method of preparing data for Power BI is a User of a star schema. Among relational data warehouses, the star schema is a mature modeling method that has been widely implemented. In order to comply with this requirement, modelers must categories their model tables as either dimensions or facts.
26. Access to Streaming Data is associated with _____.
- System administrator
- HDFS
- Network System
- None of the mentioned above
Answer: B) HDFS
Explanation:
Access to Streaming Data is associated with HDFS. In the Hadoop distribution, there is an application called Hadoop streaming that may be used to stream data. Using the tool, you can construct and run Map/Reduce tasks that can use any executable or script as the mapper and/or the reducer, depending on your preferences.
27. Power BI is used by a variety of companies, including Facebook, Twilio, GitHub, and MailChimp as,
- Online services
- Database data sources
- File data sources
- None of the mentioned above
Answer: A) Online services
Explanation:
Power BI is used by a variety of companies, including Facebook, Twilio, GitHub, and MailChimp as Online services. In any organization, systems generate a large amount of data, which can be measured in terabytes, petabytes, or even exabytes in some instances.
Businesses use Business Intelligence to evaluate this data and turn it into actionable information (decisions), and the entire process is referred to as business intelligence. It is undeniable that the success of the firm is dependent on the decisions that are made as a result of business intelligence.
28. When it comes to Power BI Desktop, which of the following might be regarded the most important feature?
- Data
- Report
- Dashboard
- All of the mentioned above
Answer: D) All of the mentioned above
Explanation:
Data, Report and Dashboard are the most important features. Power BI Desktop is used to gather, organize, transform, and visualize data in various ways. With Power BI Desktop, we can connect to a variety of different data sources and merge them (a process known as modeling) into a single data model for analysis.
29. Amongst which of the following is must before using any technology to evaluate your data,
- Study the dataset
- Organize dataset
- Remove impurities from data set
- All of the mentioned above
Answer: D) All of the mentioned above
Explanation:
Before using any technology to evaluate your data we must study the dataset, organize dataset and remove impurities from data set. Before we begin collecting data, we must develop a detailed analysis strategy that will guide us through the various steps of the research process, from summarizing and characterizing the data to testing our hypotheses.
30. Power BI modelling refers to the relationships that exist between your data sources.
- True
- False
Answer: A) True
Explanation:
Data Modeling is one of the aspects in a business intelligence tool that is used to connect multiple data sources through the usage of a relationship. A relationship explains how data sources are connected to one another, and we can use relationships to generate fascinating data visualizations across a variety of data sets. In Power BI, we can also see the "Relationship" between two variables in a data model.