Spark: Index

Secondary Indexes

Secondary Indexes provide an alternate path to the data. So far we have learned that every table has one and only one Primary Index and we have learned that the Primary Index is much faster than the Full Table Scan. Secondary Indexes are not as fast as the Primary Index, but they can be pretty fast, and they can be much faster than a Full Table Scan.

There can be up to 32 Secondary Indexes on a table,. Every Secondary Index creates a Subtable on every AMP designed to point to the real Primary Index Row-ID.
There are two types of Secondary Index.

1. Unique Secondary Indexes ( USI ) - Two AMP Operation
2. Non-Unique Secondary Indexes ( NUSI ) _ All AMP Operation

An USI is always a Two-AMP operation so it is almost as fast as a Primary Index, but a NUSI is an All-AMP operation, but not a Full Table Scan.

Hrere the Primary Index is Last_Name and it is a Non-Unique

Primary Index (NUPI).

I also want you to pay close attention to the Row-IDs in front of each row.

Secondary Index Subtables are designed to point to the real row in the base table and they will do so by pointing to the exact Row-ID of the row they are looking for in the base table.

We have created the USI on the column Emp_No. Once the USI is created no duplicate Employee Number can exist in the table. If a row is added with a duplicate value for Emp_No then Teradata will reject the row and the user will receive an error message.

As soon as the USI is created with the SQL syntax the next move comes from Teradata creating a Subtable on every AMP.This is true for both the USI and the NUSI. Let‘s say for example the DBA created the maximum of 32 secondary indexes on a table. Then there would be 32 Subtables created, each taking up PERM Space. Subtable will be to point back to the real row in the base table via the Row-ID.

Inside the Secondary Index Subtable

     There are always two columns in the Secondary Index Subtable. The column value in which you created the secondary index on, which in this case was Emp_No, and the real Row-ID of the row in the Base Table.

    Without the secondary index Subtable there would only be only two ways for Teradata to find a particular row. It would be by using the Primary Index value in the query or by doing Full Table Scan.


The Secondary Index Subtable is really a baby table that contains the Secondary Index column, which acts as the Primary Index of the Subtable so Teradata can easily find Emp_No 2 in the Subtable.

It can find any Emp_No in the Subtable because Emp_No is the Primary Index of the Subtable. When a query is written with Emp_No in the WHERE clause the Teradata Parsing Engine (PE) recognizes it is an USI and looks up the Emp_No value in the Subtable with a 1-AMP operation.

It then asks, "Can you tell me the Row-ID of the row in the base table"? Once Teradata has the Row-ID it takes the value in the Row-Hash of the Row-ID and looks at the Hash Map and knows exactly which AMP the Base Table row is on.

How Teradata Builds the Secondary Index Subtable

    As soon as the DBA uses the SQL to create a secondary index Teradata immediately gets to work. Teradata must build the secondary index Subtable immediately before it can become an alternate path to the data.

    Each AMP Hashes the secondary index value for each row they own with the Hash Formula. The result is a 32-bit Row Hash which points to a bucket in the Hash Map, which tells the secondary index row which AMPs Subtable it will be on.

    All UNIQUE Secondary Indexes are hashed and the value plus the real Row-ID of the base table are sent to the proper AMP over the BYNET.

We created the USI on the column Emp_No. Every Emp_No value will now have to also reside inside the Subtable. The first rows value for Emp_No is a 2. The PE hashes the value of 2 and the result is a 32-bit row hash.

The PE then points to the bucket in the Hash Map that corresponds to the 32-bit row hash and the Hash Map says that AMP 1 is the destination AMP. So the Emp_No value of 2 goes to AMP 1‘s Subtable. It also brings with it the real Row-ID for its row from the Base Table, which is 1,1. Now the first secondary row is perfectly placed.

Building the Secondary Index Subtable Results

         Each Secondary Index Subtable row won‘t perform a one-timer, but instead perform a two-timer because the USI Value and Base Row-ID always make an USI query a 2-AMP operation.

         Now that the Secondary Index Subtable is built, the users can query the base table. If the USI is used in the WHERE clause the Parsing Engine knows it has an alternate routine to the data. It can use a 1-AMP operation to find the Subtable Row and then use another 1-AMP operation to find the base row.

        If there are 1,000,000 rows in the base table there will be 1,000,000 rows in the Subtable. The Base table will be much larger because it probably has many columns, but the Subtable only has two columns.

USI – Always a Two-AMP Operation

Secondary Indexes provide an alternate path to the data, and should be used on queries that run thousands of times. Teradata runs extremely well without Secondary Indexes, but since secondary indexes use up space and overhead, they should only be used on "KNOWN QUERIES" or queries that are run over and over again. Once you know the data warehouse, environment you can create Secondary Indexes to enhance its performance.

Every time the Parsing Engine sees the USI column in the WHERE clause it comes up with a plan that involves only two AMPs. Memorize this if you have to, but always know that an USI query is a two-AMP operation.

A Full Table Scan Example

A Full Table Scan will be performed when a Non-Indexed Column is used by itself in the WHERE clause. We will soon create a Non-Unique Secondary Index on this column, but first perform the Full Table Scan.

The Base Table

      We have two types of tables. Those are the Base Tables that hold the actual data that the users query against and the Secondary Index Subtables designed to point to the real Row-ID of the base table.

        The Row-IDs at the front of each row in the Base Table. The PE hashed the Primary Index Column (Last_Name) Value and that came up with a 32-bit Row Hash.

    It then counted over the appropriate number of buckets in the Hash Map that corresponded to the Row Hash, and inside the bucket was the Destination AMP for that row.

     Once the row and the Row Hash went to the AMP the actual AMP placed a 32-bit Uniqueness Value behind the Row Hash. The Row Hash plus the Uniqueness Value make up a rows Row-ID.

Creating a Non-Unique Secondary Index (NUSI)

Inside the NUSI Subtable resides two columns. They are the column values of the NUSI and the real Row-ID of the row in the base table. This is exactly the same two columns that were in the USI Subtable. Remember that the entire purpose of the NUSI or the USI Subtable is to point to the real row in the Base Table. This pointing is done by capturing the rows Row-ID.

The big difference between the USI and the NUSI Subtable is that the USI Subtable rows are Hashed and the NUSI subtable rows are AMP-Local.

NUSI Subtable is AMP Local

The NUSI Subtable is always AMP-Local. What does that term mean? Let me first again explain how USIs are Hashed and then how NUSIs are AMP-Local.

When an USI Subtable is created each USI value for each row in the Base Table is Hashed and sent to the AMP the Hash Formula and Hash Map dictate.

Most often the Base Row and the Subtable Row end up on different AMPs. The great news is that the Parsing Engines plan is always a Two-AMP operation. This can be done because a Unique Secondary Index is UNIQUE, which obviously means there can only be one row returned.

A NUSI is a Non-Unique Secondary Index, obvious again meaning that the value is Non-Unique and there could be thousands, millions or even billions of duplicates. So the Parsing Engine takes on a different strategy when building the NUSI Subtable.

Each row in the Subtable only tracks the Base rows on the same AMP. This is what is meant by AMPLocal. On the following page you can see that the AMP labeled "A Typical AMP" holds two base rows of the Employee_Table. The First_Name values, which was the column we created the NUSI Index on holds two values on this AMP, which are=Rakish‘ and =Vu‘. So in this typical AMPs Subtable there will be two rows tracking =Rakish‘ and =Vu‘. A NUSI Subtable is always created on each AMP, but in each AMPs Subtable are only values local to the base rows for that AMP.Now you know what the term AMP-Local means.

Spark

Index