IBM Netezza is a powerful and highly parallelized Data
Warehousing system that is simple to administer and to maintain. This system is
an appliance that is purpose-built for data warehousing. The system is commonly
referred to as data warehouse appliance that is designed specifically for
running complex data warehousing workloads. The concept of an appliance is
realized by integrating the database, server and the storage into an easy to
deploy and manage system.
In any database system the main bottle neck is IO. IBM Netezza
reduces this bottleneck by using a commodity FPGA (Field-Programmable Gate Array) by pushing the SQL closer
to silicon to help improve IO performance. This core component of the appliance
is referred to as the Database Accelerator.
The Database Accelerator along with the other components of the
IBM Netezza appliance was discussed during a short high-level overview of the
architecture. This overview was presented at the beginning of the workshop
during a brief presentation. The presentation also included the basic usage on
how to administer and maintain a Netezza database. The concepts covered in the
presentation were reinforced by getting hands on experience using a Netezza
appliance. Instead of using an actual IBM Netezza appliance a virtualized
environment was provided with a lab manual outlining the steps and commands to
run. The lab manual also included explanations for each of the step-by-step
instructions used in the exercises.
The agenda for the topics covered in the Hands-on-Lab exercises
was:
1.
Create Netezza Database Users and Groups (and set privileges)
2.
Create the Workshop database
3.
Create tables in the Workshop database
4.
Load data into the Netezza Appliance with the nzload utility
using the External Table framework
The workshop showed how simple it was to setup a IBM Netezza
appliance after it has been delivered and configured. A factory-configured and
installed IBM Netezza appliance includes some of the following components:
§ A
preconfigured Linux operating system (with Netezza modifications)
§ Several
preconfigured Linux users and groups:
§ An IBM
Netezza database user named ADMIN. The ADMIN user is the database super-user,
and has full access to all system functions and objects
The IBM Netezza appliance also includes a SQL dialect called
Netezza Structured Query Language (NZSQL). You can use SQL commands to create
and manage your Netezza databases, user access, and permissions for the
databases, as well as to query and modify the contents of the databases.
On a new IBM Netezza appliance, there is one main database,
SYSTEM, and a database template, MASTER_DB. IBM Netezza uses the MASTER_DB as a
template for all other user databases that are created on the system.
Before creating the databases and tables, a brief explanation
was provided about the virtualized environment used in the workshop. This also
included how to connect to the Netezza appliance, which is completed through
the Netezza SMP Host. Once connected to the Netezza appliance a set of new
users were created, which were used for the remainder of the workshop. The
concept of users and privileges were explored later when the database and
tables were created. This would involve setting up a basic Security Access
Model, which restricted or permitted certain actions to objects within the
Netezza Appliance.
After the Netezza Database Users were created the database and
the tables for the workshop were created. Once the database and the tables are
created, the next step as with any data warehouse environment is to load data
into the tables in the database. This was easy by using the Netezza utility
nzload which uses the External Table framework to efficiently load data in to a
Netezza database. This framework contains more than one component, some of
these components are:
§ External
Tables -- These are tables stored as flat files on the host or client systems
and registered like tables in the Netezza catalog. They can be used to load
data into the Netezza appliance or unload data to the file system.
§ nzload
-- This is a wrapper command line tool around external tables that provides an
easy method loading data into the Netezza appliance.
§ Format
Options -- These are options for formatting the data load to and from external
tables.
With a good understanding on how to create and populate tables
in a Netezza database discussion followed on the importance of Data
Distribution. Since IBM Netezza is built on a massively parallel architecture
that distributes data and workloads over a large number of processing and data
nodes, the single most important tuning factor is choosing the right
distribution key. The distribution key governs which data rows of a table are
distributed to a data slice and it is very important to choose an optimal
distribution key to avoid data skew, processing skew and to make joins
co-located whenever possible. This concept was so important that a separate section
was devoted to this topic. The exercises examined how to pick the best Hash Key
for distribution for each of the tables created in this workshop. During these
set of exercises CTAS tables were utilized that showed how easy it is to change
the Hash Key for a table without having to manually recreate and reload the
data in the table.
For More Click On Below Link:
Nice Blog IBM NETEZZA DBA Online Training
ReplyDelete