Limited Time Discount Offer 30% Off - Ends in 02:00:00

×

Cloudera CCB-400 Exam - Cloudera Certified Specialist in Apache HBase

Questions & Answers for Cloudera CCB-400

Showing 1-15 of 45 Questions

Question #1

Under default settings, which feature of HBase ensures that data won't be lost in the event
of a RegionServer failure?

A. All HBase activity is written to the WAL, which is stored in HDFS

B. All operations are logged on the HMaster.

C. HBase is ACID compliant, which guarantees that it is Durable.

D. Data is stored on the local filesystem of the RegionServer.

Explanation: HBase data updates are stored in a place in memory called memstore for
fast write. In the event of a region server failure, the contents of the memstore are lost
because they have not been saved to disk yet. To prevent data loss in such a scenario, the
updates are persisted in a WAL file before they are stored in the memstore. In the event of
a region server failure, the lost contents in the memstore can be regenerated by replaying
the updates (also called edits) from the WAL file.
Reference: HBase Log Splitting
http://tm.durusau.net/?p=27674(See From the post second paragraph)

Question #2

Your client is writing to a region when the RegionServer crashes. At what point in the write
is your data secure?

A. From the moment the RegionServer wrote to the WAL (write-ahead log)

B. From the moment the RegionServer returned the call

C. From the moment the RegionServer received the call

D. From the moment the RegionServer wrote to the MemStore

Explanation: Each RegionServer adds updates (Puts, Deletes) to its write-ahead log
(WAL) first, and then to the Section 9.7.5.1, MemStore for the affected Section 9.7.5,
Store. This ensures that HBasehas durable writes. Without WAL, there is the possibility of
data loss in the case of a RegionServer failure before each MemStore is flushed and new
StoreFiles are written. HLog is the HBase WAL implementation, and there is one HLog
instance per RegionServer.
Note:
In computer science, write-ahead logging (WAL) is a family of techniques for providing
atomicity and durability (two of the ACID properties) in database systems.
In a system using WAL, all modifications are written to a log before they are applied.
Usually both redo and undo information is stored in the log.
The purpose of this can be illustrated by an example. Imagine a program that is in the
middle of performing some operation when the machine it is running on loses power. Upon
restart, that program might well need to know whether the operation it was performing
succeeded, half-succeeded, or failed. If a write-ahead log were used, the program could
check this log and compare what it was supposed to be doing when it unexpectedly lost
power to what was actually done. On the basis of this comparison, the program could
decide to undo what it had started, complete what it had started, or keep things as they are.
WAL allows updates of a database to be done in-place. Another way to implement atomic
updates is with shadow paging, which is not in-place. The main advantage of doing
updates in-place is that it reduces the need to modify indexes and block lists.

Question #3

You have a table with the following rowkeys:
r1, r2, r3, r10, r15, r20, r25, r30, r35
In which order will these rows be retrieved from a scan?

A. r35, r30, r3, r25, r20, r2, r15, r10, r1

B. r1, r2, r3, r10, r15, r20, r25, r30, r35

C. r1, r10, r15, r2, r20, r25, r3, r30, r35

D. r35, r30, r25, r20, r15, r10, r3, r2, r1

Explanation: If you can have the table receiving rows always in decreasing order of the
row keys, you then have easy access to the first and last rows. This is possible because
HBase tables are always sorted by row key.

Question #4

Yon are storing page view data for a large number of Web sites, each of which has many
subdomains (www.example.com, archive.example.com, beta.example.com, etc.) Your
reporting tool needs to retrieve the total number of page views for a given subdomain of a
Web site. Which of the following rowkeys should you use?

A. The reverse domain name (e.g., com.example.beta)

B. The domain name followed by the URL

C. The URL

D. The URL followed by the reverse domain name

Explanation: Consider a table whose keys are domain names. It makes the most sense to
list them in reverse notation (so "com.jimbojw.www" rather than "www.jimbojw.com") so
that rows about a subdomain will be near the parent domain row.
Continuing the domain example, the row for the domain "mail.jimbojw.com" would be right
next to the row for "www.jimbojw.com" rather than say "mail.xyz.com" which would happen
if the keys were regular domain notation.
Reference: Understanding HBase and BigTable

Question #5

Data is written to the HLog in which of the following orders?

A. In order of writes

B. In order of writes, separated by region

C. Ascending first by region and second by row key

D. Descending first by region and second by row key

Question #6

You have a table where keys range from "A" to "Z", and you want to scan from "D" to "H."
Which of the following is true?

A. A MultiGet must be issued for rows D, E, F, G, H.

B. The scan class supports ranges via the stop and start rows.

C. All scans are full table scans, the client must implement filtering.

D. In order to range scan, raw scan mode must be enabled.

Explanation: Rather than specifying a single row, an optional startRow and stopRow may
be defined. If rows are not specified, the Scanner will iterate over all rows.
Reference: org.apache.hadoop.hbase.client,Class Scan

Question #7

You have a total of three tables stored in HBase. Exchanging catalog regions, how many
regions will your RegionServers have?

A. Exactly three

B. Exactly one

C. At least one

D. At least three

Question #8

Given the following HBase table schema:
Row Key, colFam_A:a, colFam_A:b, colFamB:2, colFam_B:10
A table scan will return the column data in which of the following sorted orders:

A. Row Key, colFam_A:a, colFam__A:b, colFam_B:10, colFam_B:2

B. Row Key, colFam_A:a, colFam__A:b, colFam_B:2, colFam_B:10

C. Row Key, colFam_A:a, colFam__B:2, colFam_A:b, colFam_B:10

D. Row Key, colFam_A:a, colFam__B:10, colFam_A:b, colFam_B:2

Explanation: All is sorted in hbase, first by row (row key), then by column family
followed by column qualifier, type and finally timestamp (ts is sorted
in reverse .. so you see newest records first).

Question #9

You need to create a "WebLogs" table in HBase. The table will consist of a single Column
Family called "Errors" and two column qualifiers, "IP" and "URL". The shell command you
should use to create the table is:

A. create 'WebLogs', {NAME => 'Errors:IP', NAME =>'Errors:URL'}

B. create 'WebLogs', 'Errors' {NAME => 'IP', NAME => 'URL'}

C. create 'WebLogs', 'Errors:IP', 'Errors:URL'

D. create 'WebLogs', 'Errors'

Explanation: Columns in Apache HBase are grouped into column families. All column
members of a column family have the same prefix. For example, the columns
courses:history and courses:math are both members of the courses column family. The
colon character (:) delimits the column family from the column qualifier . The column family
prefix must be composed of printable characters. The qualifying tail, the column family
qualifier, can be made of any arbitrary bytes. Column families must be declared up front at
schema definition time whereas columns do not need to be defined at schema time but can
be conjured on the fly while the table is up an running.
Physically, all column family members are stored together on the filesystem. Because
tunings and storage specifications are done at the column family level, it is advised that all
column family members have the same general access pattern and size characteristics.

Question #10

From within an HBase application, you would like to create a new table named weblogs.
You have started with the following Java code:
HBaseAdmin admin = new HBaseAdmin (conf);
HTableDescriptor t = new HTableDescriptor(weblogs);
Which of the following method(s) would you use next?

A. admin.createTable(t);admin.enable.Table(t);

B. admin.createTable(t);

C. HTable.createTable(t);HTable.enableTable(t);

D. HTable.createTable(t);

Explanation: See line 10 below.
Creating a table in HBase
public void createTable (String tablename, String familyname) throws IOException {
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
HTableDescriptor tabledescriptor = new HTableDescriptor(Bytes.toBytes(tablename));
tabledescriptor.addFamily(new HColumnDescriptor (familyname));
admin.createTable(tabledescriptor);
}
Reference:HBASE ADMINISTRATION USING THE JAVA API, USING CODE EXAMPLES
http://linuxjunkies.wordpress.com/2011/12/03/hbase-administration-using-the-java-api-
using-code-examples/(creating a table in Hbase, see the code)

Question #11

From within an HBase application, you want to retrieve two versions of a row, if they exist.
Where your application should configure the maximum number of versions to be retrieved?

A. HTableDescriptor

B. HTable

C. Get or scan

D. HColumnDescriptor

Explanation: maxVersions - Maximum number of versions to keep
Note:
*public HColumnDescriptor(byte[] familyName,
int maxVersions,
String compression,
boolean inMemory,
boolean blockCacheEnabled,
int timeToLive,
String bloomFilter)
*
An HColumnDescriptor contains information about a column family such as the number of
versions, compression settings, etc. It is used as input when creating a table or adding a
column. Once set, the parameters that specify a column cannot be changed without
deleting the column and recreating it. If there is data stored in the column, it will be deleted
when the column is deleted.
Reference:org.apache.hadoop.hbase,Class HColumnDescriptor

Question #12

Your HBase cluster has hit a performance wall and doesn't seem to be getting faster as
you add
RegionServers. Adding an additional HMaster will:

A. Have no effect on performance.

B. Improve the performance of region writes but decrease the performance of metadata changes.

C. Improve the performance of metadata chancier, but decrease the performance of region writes.

D. Make the performance problem even worse, as operations will have to be replicated to multiple masters.

Explanation: You can add multiple HBase master nodes; however, only one HBase
master node is active at a time. The active HBase master node changes only when the
current active HBase master node is shut down or fails.

Question #13

You have a key-value pair size of l00 bytes. You increase your HFile block size from its
default 64k. What results from this change?

A. scan throughput increases and random-access latency decreases

B. scan throughput decreases and random-access latency increases

C. scan throughput decreases and random-access latency decreases

D. scan throughput increases and random-access latency increases

Explanation: Larger block size is preferred if files are primarily for sequential access.
Smaller blocks are good for random access, but require more memory to hold the block
index, and may be slower to create
Reference: Could I improve HBase performance by reducing the hdfs block size?

Question #14

You want to do mostly full table scans on your data. In order to improve performance you
increase your block size. Why does this improve your scan performance?

A. It does not. Increasing block size does not improve scan performance.

B. It does not. Increasing block size means that fewer blocks fit into your block cache. This requires HBase to read each block from disk rather than cache for each scan, thereby decreasing scan performance.

C. Increasing block size requires HBase to read from disk fewer times, thereby increasing scan performance.

D. Increasing block size means fewer block indexes that need to be read from disk, thereby increasing scan performance.

Explanation: Change HFile block size to something bigger to improve scan (at cost of
random read).
Reference:Testing HBase Scan performance

Question #15

You have an average key-value pair size of 100 bytes. Your primary access is random
needs on the table. Which of the following actions will speed up random reading
performance on your cluster?

A. Turn off WAL on puts

B. Increase the number of versions kept

C. Decrease the block size

D. Increase the block size

Explanation: Larger block size is preferred if files are primarily for sequential access.
Smaller blocks are good for random access, but require more memory to hold the block
index, and may be slower to create
Reference: Could I improve HBase performance by reducing the hdfs block size?

You Need Avanset VCE Player in Order to Open VCE Files

AUTUMN SALE: 30% DISCOUNT
This is ONE TIME OFFER

You save
30%

Enter Your Email Address to Receive Your 30% Discount Code

AUTUMN SALE: 30% DISCOUNT

You save
30%

Use Discount Code:

A confirmation link was sent to your e-mail.

Please check your mailbox for a message from support@exam-labs.com and follow the directions.