S/W Architecture

The Netezza hardware components and intelligent system software are closely intertwined.
The software  is designed to fully exploit the hardware capabilities of the
appliance and incorporates numerous innovations to offer exponential performance gains,
whether for simple inquiries, complex ad-hoc queries, or deep analytics. In this section, we
examine the intelligence built into the system every step of the way.

Netezza software components include:

  • A sophisticated parallel optimizer that transforms queries to run more efficiently andensures that each component in every processing node is fully utilized
  • An intelligent scheduler that keeps the system running at its peak throughput, regardless of workload
  • Turbocharged Snippet Processors that efficiently execute multiple queries and complex analytics functions concurrently
  • A smart network that makes moving large amounts of data through the Netezza system a breeze

Let's see how these elements work together, starting when a user submits a query.
Technology-savvy readers will see that the Netezza processes queries very differently than
other data warehouse systems.

Make an optimized query plan

The host compiles the query and creates a query execution plan optimized for the Netezza
AMPP architecture. The intelligence of the Netezza optimizer is one of the system's greatest
strengths. The optimizer makes use of all the MPP nodes in the system to gather detailed,
up-to-date statistics on every database table referenced in a query. A majority of these
metrics are captured during query execution with very low overhead, yielding just-in-time
statistics that are individualized per query. The appliance nature of the Netezza system, with
integrated components able to communicate with each other, allows the cost-based optimizer
to more accurately measure disk, processing, and network costs associated with an
operation. By relying on accurate data rather than heuristics alone, the optimizer is able to
generate query plans that utilize all components with extreme efficiency.

Intelligence in the optimizer (calculating join order): One example of optimizer
intelligence is the ability to determine the best join order in a complex join. For example,
when joining multiple small tables to a large fact table, the optimizer can choose to
broadcast the small tables in their entirety to each of the S-Blades, while keeping the large
table distributed across all Snippet Processors. This approach minimizes data movement
while taking advantage of the AMPP architecture to parallelize the join.

By utilizing these statistics to transform queries before processing begins, the optimizer
minimizes disk I/O and data movement, the two factors slowing performance in a data
warehouse system. Transforming operations performed by the optimizer include:
  • Determining correct join order
  • Rewriting expressions
  • Removing redundancy from SQL operations

Convert it to snippets
The compiler converts the query plan into executable code segments, called snippets, which
are query segments executed by Snippet Processors in parallel across all the data streams in
the appliance. Each snippet has two elements: compiled code executed by individual CPU
cores and a set of FPGA parameters to customize the FAST engines' filtering for that
particular snippet. This snippet-by-snippet customization allows the Netezza platform to
provide, in effect, a hardware configuration optimized on the fly for individual queries.

Schedule them to run at just the right moment
The Netezza scheduler balances execution across complex workloads to meet the objectives of different users, while maintaining maximum utilization and throughput.

It considers a number of factors, including query priority, size, and resource availability, in
determining when to execute snippets on the S-Blades. The scheduler uses the appliance
architecture to gather up-to-date and accurate metrics about resource availability from each
component of the system. Using sophisticated algorithms, the scheduler maximizes system
throughput by utilizing close to 100% of the disk bandwidth and ensuring that memory and
network resources are not overloaded, a common cause of thrashing for other, less efficient
systems. This is an important characteristic of the Netezza platform, ensuring the system
keeps performing at peak throughput even under very heavy loads.

When the scheduler gives the green light, the snippet is broadcast to all Snippet Processors
through the intelligent network fabric.

Execute them in parallel

Each Snippet Processor on every S-Blade now has the instructions it needs to execute its
portion of the snippet. In addition to the host scheduler, the Snippet Processors have their
own smart preemptive scheduler that allows snippets from multiple queries to execute
simultaneously. The scheduler takes into account the priority of the query and the resources
set aside for the user or group that issued it to decide when and for how long to schedule a
particular snippet for execution. When that instant arrives, it's show time:

1. The processor core on each Snippet Processor configures the FAST engines with
parameters contained in the query snippet and sets up a data stream.

2. The Snippet Processor reads table data from the disk array into memory, utilizing a
Netezza innovation called ZoneMapTM acceleration to reduce disk scans. The Snippet
Processor also interrogates the cache before accessing the disk for a data block, avoiding
a scan if the data is already in memory.

3. The FPGA then acts on the data stream. It first accelerates the data stream by a factor of
up to 4 to 8 times by uncompressing the data stream at wire speed.

4. The FAST engines then filter out any data not relevant to the query. The remaining data
streams back to memory for concurrent processing by the CPU core. This data is typically
a tiny fraction (2–5%) of the original stream, greatly reducing the execution time required
by the processor core.

5. The processor core picks up the data stream and performs core database operations such
as sorts, joins, and aggregations. It also applies complex algorithms embedded in the
Snippet Processor for advanced analytics processing.

6. Results from each Snippet Processor are assembled in memory to produce a sub-result
for the entire snippet. This process is repeated simultaneously across more than a
thousand Snippet Processors, with hundreds or thousands of query snippets executing in


ZoneMap acceleration (the Netezza anti-index): ZoneMap acceleration exploits the
natural ordering of rows in a data warehouse to accelerate performance by orders of
magnitude. The technique avoids scanning rows with column values outside the start and
end range of a query. For example, if a table contains two years of weekly records (~100
weeks) and a query is looking for data for only one week, ZoneMap acceleration can
improve performance up to 100 times. Unlike indexes, ZoneMaps are automatically
created and updated for each database table, without incurring any administrative

And return the results!

All Snippet Processors now have snippet results that must be assembled. The Snippet
Processors use the intelligent network fabric to communicate flexibly with the host and with
each other to perform intermediate calculations and aggregations.

Intelligence in the network (predictable performance and scalability): The Netezza
custom network protocol is designed specifically for the data volumes and traffic patterns
associated with high volume data warehousing. The Netezza protocol ensures maximum
utilization of the network bandwidth without overloading it, allowing predictable
performance close to the line rate.
Traffic flows smoothly in three distinct directions:

  • From the host to the Snippet Processors (1 to 1000+) in broadcast mode
  • From Snippet Processors to the host (1000+ to 1), with aggregation in the S-Blades and at the system rack level
  •  Between Snippet Processors (1000+ to 1000+), with data flowing freely on a massive scale for intermediate processing

The host assembles the intermediate results received from the Snippet Processors, compiles
the final result set and returns it to the user's application. Meanwhile, other queries are

streaming through the system at various stages of completion.

No comments:

Post a Comment