Friday, March 19, 2021

Install Apache Spark on Windows

 

Prerequisites

This guide assumes that you are using Windows 10 and the user had admin permissions.

System requirements:

  • Windows 10 OS
  • At least 4 GB RAM
  • Free space of at least 20 GB

Installation Procedure

Step 1: Go to the below official download page of Apache Spark and choose the latest release. For the package type, choose ‘Pre-built for Apache Hadoop’.

The page will look like below.

Apache Spark installation Process

Step 2:  Once the download is completed unzip the file, to unzip the file using WinZip or WinRAR or 7-ZIP.

Step 3: Create a folder called Spark under your user Directory like below and copy paste the content from the unzipped file.

C:\Users\<USER>\Spark

It looks like below after copy-pasting into the Spark directory.

Step 4: Go to the conf folder and open log file called, log4j.properties. template. Change INFO to WARN (It can be ERROR to reduce the log). This and next steps are optional.

Remove. template so that Spark can read the file.

Before removing. template all files look like below.

Apache Spark installation Process

After removing. template extension, files will look like below

Apache Spark installation Process

Step 5: Now we need to configure path.

Go to Control Panel -> System and Security -> System -> Advanced Settings -> Environment Variables

Add below new user variable (or System variable) (To add new user variable click on New button under User variable for <USER>)

Apache Spark installation Process

Click OK.

Add %SPARK_HOME%\bin to the path variable.

Apache Spark installation Process

Click OK.

Step 6: Spark needs a piece of Hadoop to run. For Hadoop 2.7, you need to install winutils.exe.

You can find winutils.exe from below page

Download it.

Step 7: Create a folder called winutils in C drive and create a folder called bin inside. Then, move the downloaded winutils file to the bin folder.

C:\winutils\bin

Apache Spark installation Process

Add the user (or system) variable %HADOOP_HOME% like SPARK_HOME.

Apache Spark installation Process

Apache Spark installation Process

Click OK.

Step 8: To install Apache Spark, Java should be installed on your computer. If you don’t have java installed in your system. Please follow the below process

Java Installation Steps:

  • Go to the official Java site mentioned below  the page.

Accept Licence Agreement for Java SE Development Kit 8u201

  • Download jdk-8u201-windows-x64.exe file
  • Double Click on Downloaded .exe file, you will the window shown below.

Java Installation Steps

  • Click Next.
  • Then below window will be displayed.

Java Installation Steps

  • Click Next.
  • Below window will be displayed after some process.

Java Installation Steps

  • Click Close.

Test Java Installation:

Open Command Line and type java -version, then it should display installed version of Java

Java Installation Steps

You should also check JAVA_HOME and path of %JAVA_HOME%\bin included in user variables (or system variables)

1. In the end, the environment variables have 3 new paths (if you need to add Java path, otherwise SPARK_HOME and HADOOP_HOME).

Java Installation Steps

2. Create c:\tmp\hive directory. This step is not necessary for later versions of Spark. When you first start Spark, it creates the folder by itself. However, it is the best practice to create a folder.

C:\tmp\hive

Test Installation:

Open command line and type spark-shell, you get the result as below.

Test Installation in Apache Spark

We have completed spark installation on Windows system. Let’s create RDD and     Data frame

We create one RDD and Data frame then will end up.

1. We can create RDD in 3 ways, we will use one way to create RDD.

Define any list then parallelize it. It will create RDD. Below is code and copy paste it one by one on the command line.

val list = Array(1,2,3,4,5)
val rdd = sc.parallelize(list)

Above will create RDD.

2. Now we will create a Data frame from RDD. Follow the below steps to create Dataframe.

import spark.implicits._
val df = rdd.toDF("id")

Above code will create Dataframe with id as a column.

To display the data in Dataframe use below command.

Df.show()

It will display the below output.

Test Installation in Apache Spark

How to uninstall Spark from Windows 10 System: 

Please follow below steps to uninstall spark on Windows 10.

  1. Remove below System/User variables from the system.
  2. SPARK_HOME
  3. HADOOP_HOME

To remove System/User variables please follow below steps:

Go to Control Panel -> System and Security -> System -> Advanced Settings -> Environment Variables, then find SPARK_HOME and HADOOP_HOME then select them, and press DELETE button.

Find Path variable Edit -> Select %SPARK_HOME%\bin -> Press DELETE Button

Select % HADOOP_HOME%\bin -> Press DELETE Button -> OK Button

Open Command Prompt the type spark-shell then enter, now we get an error. Now we can confirm that Spark is successfully uninstalled from the System.

Thursday, March 28, 2019

Dynamically manage Hive Table in spark


Method to Drop Table and Files
object test {
    def main(args: Array[String]): Unit = {
     val databaseName = args(0)
     val tableName = args(1)

def method1 (spark: SparkSession ,databaseName: String, tableName: String): Unit={
    import spark.implicits._
    val  tblary= Array("Table1","Table2","Table3","Table4")
    for  (tab <- tblary){
tblcheck=spark.catlog.tableExists("databaseName","tableName")
    if(tblcheck){
    val loc = spark.Sql("desc formatted" +databaseName" "." +tableName).toDF.fillter('col_Name === "Location").collect()(0)(1).toString

//val loc = spark.Sql("desc formatted" +databaseName" "." +tableName).toDF.fillter('col_Name === "Location").select("data_type") -- can use this also

spark.Sql("drop table" +databaseName" "." +tableName)
process.apply("hadoop fs -rm -r -skipTrash" + Location).run().exitValue()
       }
    }
}


Method to create Table and Files
  def test2(spark: SparkSession ,databaseName: String, tableName: String): Unit={
  val createTable = "Create External Table IF NOT EXISTS" + databaseName + ".Table1  (" +
                            "col1," +
                            "col2," +
                            "col3," +
                            "col4"  +
                            "LOCATION   '"  + loc + "/Table1 '"
spark.sql(createTable)
}   

Tuesday, August 13, 2013

Teradata Parallel Transporter

Teradata Parallel Transporter is the preferred load/unload tool for the Teradata Database. Parallel Transporter is able to run all the bulk and continuous Teradata load/unload protocols in one product. In the past, a user had to run the protocols with separate tools with different script languages. The stand-alone load tools, FastLoad, MultiLoad, TPump, and FastExport are functionally stabilized and no new features are being added other than to keep them operational in supporting new Teradata Database releases.
 
All new features requested by customers are being added to Parallel Transporter. Currently, the stand-alone load tools are being supported indefinitely and no discontinuation notice has yet been issued. It is recommended that all new Teradata load applications be implemented with Parallel Transporter.
 
Parallel Transporter can be invoked through 4 interfaces:
  • Application Program Interface (API) – used by leading ETL vendors for tight, parallel, high-performance integration
  • Script – used when a customer doesn’t have an ETL tool
  • Command line (sometimes referred to as the Easy Loader interface) – used to load data from a flat file with a single command line 
  • Wizard – used to generate simple scripts. Use this tool as a way to learn the script language and not as a production load interface.
 Listed are the four main Parallel Transporter Operators:
  • Load Operator – bulk loading of empty tables (FastLoad protocol)
  • Update Operator -  bulk load/update/upsert/delete of tables (MultiLoad protocol)
  • Stream Operator continuous loading of tables (TPump protocol)
  • Export Operator bulk unloading of tables (FastExport protocol)
What is the architecture?

If you run TPT with the script interface, a TPT infrastructure component interprets the script and invokes the proper Operators to read and load the data.

If you use and ETL tool, the ETL tool will read and transform the data and pass the data in memory to the TPT API interface which will invoke the proper Operator to load the data.

 

Friday, July 26, 2013

Teradata Utility Error Handling - Multiload

The Teradata Multiload utility provides the capability to perform batch maintenance on tables (insert, update, delete and upsert). It loads data from external sources and provides the capability to restart jobs interrupted by errors, exceptions and failures.

A Multiload job executes in 5 phases:
1. Preliminary

  • Parses and validates all of the MultiLoad commands and Tera-data SQL statements in your MultiLoad job
  • Establishes sessions and process control with the Teradata Database
  • Submits special Teradata SQL requests to the Teradata Data-base
  • Creates and protects temporary work tables and error tables in the Teradata Database

2. DML Transaction


  • Submits the DML statements specifying the insert, update, and delete tasks to the Teradata Database

3. Acquisition

  • Imports data from the specified input data source
  • Evaluates each record according to specified application con-ditions
  • Loads the selected records into the worktables in the Tera-data Database
  • There is no acquisition phase activity for a MultiLoad delete task.

4. Application

  • Acquires locks on the specified target tables and views in the Teradata Database
  • For an import task, inserts the data from the temporary work tables into the target tables or views in the Teradata Data-base
  • For a delete task, deletes the specified rows from the target table in the Teradata Database
  • Updates the error tables associated with each MultiLoad task

5. Cleanup


  • Forces an automatic restart/rebuild if an AMP went offline and came back online during the application phase
  • Releases all locks on the target tables and views
  • Drops the temporary work tables and all empty error tables from the Teradata Database
  • Reports the transaction statistics associated with the import and delete tasks.
  • Multiload returns the returns codes described previously (0, 4, 8, 12) to the operating system or calling program.


Restarts

It is possible to restart a Multiload job in any of the phases after an error or failure has interrupted an executing Multiload job. There are differences in the restart procedures in some of the phases that will be described. It is also possible to rerun an interrupted Multiload job in some of the phases which will also be discussed.

A Multiload job creates a restart log table that must be present for a job to be restarted and must be dropped before a job can be rerun. It creates a work table for each input source. These tables must be pre-sent for a job to restart and must be dropped to rerun a job. Multiload also creates two error tables for each target table that must be present for a restart and that must be dropped to rerun a Multiload job.

A Multiload job can not be restarted or rerun if these tables are not present for an interruption that occurs during the APPLICATION PHASE.

Checkpoint frequency can be set for checkpoint to occur after a number of records have been read from an input source. The checkpoints are taken during the acquisition phase. A restart after an interruption in the acquisition phase will start reading the input source at the point following the last checkpoint executed before the interruption.

Interruptions that occur during the application phase will be restarted based on internal checkpoints created by the database.

The following is a simple Multiload script that will be use to illustrate the Multiload restart capabilities.




.logtable test.tranmllog;
.LOGON test/testuser,test;

.begin import 
 mload tables test.tran
 checkpoint 20000;

.layout tranlayout;
       .FIELD STORE * varchar(10);
       .field TRAN_NO * varchar  (10);
       .field TRAN_DATE * varchar  (10);
       .field TRAN_TIME * varchar  (10);
       .field TRAN_TYPE * varchar (1);
       .field TRAN_TERMINAL * varchar (3);
       .field CLERK * varchar (10);
       .field CUST_NO * varchar (10);
       .field NAME * varchar (30);
       .field TOTAL_SALE * varchar (14);
       .field SALES_TAX * varchar (14);
       .field DISCOUNT * varchar (14);
       .field DISC_PCT * varchar (10);
       .field DISC_TAKEN_FLAG * varchar (1);
       .field CASH_TENDERED * varchar (14);
       .field CHANGE_GIVEN * varchar (14);
       .field CHARGE_TENDERED* varchar (14);
       .field CHECK_NUMBER * varchar (20);
       .field CHECK_TENDERED * varchar (14);
       .field BANKCARD_TENDERED * varchar (14);
       .field PURCHASE_ORDER_NUM * varchar (20);

.dml label insdml;
insert into test.tran.*;

.import infile C:\DataIntegration\Data\TRAN2.DAT
        format vartext '|'
        layout tranlayout
        apply insdml;
.end mload;
.logoff;

This script inserts rows into one table, “TRAN”. It set a checkpoint every 20000 input records.

Restart after database reset

The following is an example of an automatic restart that occurs for an interruption and reset that occurred during the acquisition phase.

C:\DataIntegration\Scripts>mload -b < tran.ml
**** 22:11:09 UTY2414 BRIEF option is enabled.
     ========================================================================
     =          MultiLoad Utility    Release MLOD.13.00.00.000              =
     =          Platform WIN32                                              =
     ========================================================================
     =     Copyright 1990-2009, Teradata Corporation. ALL RIGHTS RESERVED.  =
     ========================================================================
**** 22:11:09 UTY2411 Processing start date: SUN AUG 09, 2009
     ========================================================================
     =          Logon/Connection                                            =
     ========================================================================
0001 .logtable test.tranmllog;
0002 .LOGON test/testuser,;
**** 22:11:11 UTY8400 Teradata Database Release: 12.00.01.24
**** 22:11:11 UTY8400 Teradata Database Version: 12.00.01.24
**** 22:11:11 UTY8400 Default character set: ASCII
**** 22:11:11 UTY8400 Current RDBMS has interval support
**** 22:11:11 UTY8400 Current RDBMS has UDT support
**** 22:11:11 UTY8400 Maximum supported buffer size: 1M
**** 22:11:11 UTY8400 Data Encryption supported by RDBMS server
**** 22:11:12 UTY6211 A successful connect was made to the RDBMS.
**** 22:11:12 UTY6217 Logtable 'test.tranmllog' has been created.

0003 .begin import
        mload tables test.tran
        checkpoint 20000;
     ========================================================================
     =          Processing MultiLoad Statements                             =
     ========================================================================

0004 .layout tranlayout;
0005        .FIELD STORE * varchar(10);
0006        .field TRAN_NO * varchar  (10);
0007        .field TRAN_DATE * varchar  (10);
0008        .field TRAN_TIME * varchar  (10);
0009        .field TRAN_TYPE * varchar (1);
0010        .field TRAN_TERMINAL * varchar (3);
0011        .field CLERK * varchar (10);
0012        .field CUST_NO * varchar (10);
0013        .field NAME * varchar (30);
0014        .field TOTAL_SALE * varchar (14);
0015        .field SALES_TAX * varchar (14);
0016        .field DISCOUNT * varchar (14);
0017        .field DISC_PCT * varchar (10);
0018        .field DISC_TAKEN_FLAG * varchar (1);
0019        .field CASH_TENDERED * varchar (14);
0020        .field CHANGE_GIVEN * varchar (14);
0021        .field CHARGE_TENDERED* varchar (14);
0022        .field CHECK_NUMBER * varchar (20);
0023        .field CHECK_TENDERED * varchar (14);
0024        .field BANKCARD_TENDERED * varchar (14);
0025        .field PURCHASE_ORDER_NUM * varchar (20);

0026 .dml label insdml;
0027 insert into test.tran.*;

0028 .import infile C:\DataIntegration\Data\TRAN2.DAT
             format vartext '|'
             layout tranlayout
             apply insdml;
0029 .end mload;
     ========================================================================
     =          MultiLoad Initial Phase                                     =
     ========================================================================
**** 22:11:12 UTY0829 Options in effect for this MultiLoad import task:
     .       Sessions:    One session per available amp.
     .       Checkpoint:  20000 records.
     .       Tenacity:    4 hour limit to successfully connect load sessions.
     .       Errlimit:    No limit in effect.
     .       AmpCheck:    In effect for apply phase transitions.
**** 22:11:13 UTY0815 MLOAD session(s) connected: 2.
     ========================================================================
     =          MultiLoad Acquisition Phase                                 =
     ========================================================================
**** 22:11:16 UTY1812 A checkpoint is being initiated because 20000 output recor
ds have been
     sent to the RDBMS.
**** 22:11:17 UTY0827 A checkpoint has been taken, recording that input record 2
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:11:17 UTY1812 A checkpoint is being initiated because 40000 output recor
ds have been
     sent to the RDBMS.
**** 22:11:43 UTY0817 MultiLoad submitting the following request:
     CHECKPOINT LOADING INTERVAL 2;
**** 22:11:43 UTY0806 RDBMS error, 2825: NO RECORD OF THE LAST REQUEST WAS FOUND
 AFTER DBC
     RESTART
**** 22:11:43 UTY1804 Re-starting this MultiLoad task because The RDBMS crashed.

     ========================================================================
     =          MultiLoad Initial Phase                                     =
     ========================================================================
**** 22:11:43 UTY0829 Options in effect for this MultiLoad import task:
     .       Sessions:    One session per available amp.
     .       Checkpoint:  20000 records.
     .       Tenacity:    4 hour limit to successfully connect load sessions.
     .       Errlimit:    No limit in effect.
     .       AmpCheck:    In effect for apply phase transitions.
**** 22:11:44 UTY0815 MLOAD session(s) connected: 2.
     ========================================================================
     =          MultiLoad Acquisition Phase                                 =
     ========================================================================
**** 22:11:46 UTY1806 This MultiLoad Import task is re-starting from checkpoint
taken at
     record 20000 of IMPORT 1.
**** 22:11:47 UTY1812 A checkpoint is being initiated because 40000 output recor
ds have been
     sent to the RDBMS.
**** 22:11:48 UTY0827 A checkpoint has been taken, recording that input record 4
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:11:49 UTY1812 A checkpoint is being initiated because 60000 output recor
ds have been
     sent to the RDBMS.
**** 22:11:50 UTY0827 A checkpoint has been taken, recording that input record 6
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:11:51 UTY0826 A checkpoint has been taken, recording that end of file ha
s been reached
     for IMPORT 1 of this MultiLoad Import task.
**** 22:11:51 UTY1803 Import processing statistics
     .                                       IMPORT  1     Total thus far
     .                                       =========     ==============
     Candidate records considered:........       67817.......       67817
     Apply conditions satisfied:..........       67817.......       67817
     Candidate records not applied:.......           0.......           0
     Candidate records rejected:..........           0.......           0
     ========================================================================
     =          MultiLoad Application Phase                                 =
     ========================================================================
**** 22:12:10 UTY0818 Statistics for table test.tran:
        Inserts:        67817
        Updates:            0
        Deletes:            0
     ========================================================================
     =          MultiLoad Task Cleanup                                      =
     ========================================================================
**** 22:12:10 UTY0821 Error table test.ET_tran is EMPTY, dropping table.
**** 22:12:11 UTY0821 Error table test.UV_tran is EMPTY, dropping table.
**** 22:12:11 UTY0825 Error table statistics for:

     Target table 1: test.tran
     Number of Rows  Error Table Name
     ==============  ========================================================
                  0  test.ET_tran
                  0  test.UV_tran

**** 22:12:12 UTY0822 MultiLoad processing complete for this MultiLoad import ta
sk.
     ========================================================================
     =          MultiLoad Task Complete                                     =
     ========================================================================

0030 .logoff;
     ========================================================================
     =          Logoff/Disconnect                                           =
     ========================================================================
**** 22:12:12 UTY6216 The restart log table has been dropped.
**** 22:12:12 UTY6212 A successful disconnect was made from the RDBMS.
**** 22:12:12 UTY2410 Total processor time used = '1.625 Seconds'
     .       Start : 22:11:09 - SUN AUG 09, 2009
     .       End   : 22:12:12 - SUN AUG 09, 2009
     .       Highest return code encountered = '0'.

An automatic restart may also occur if an interruption occurs during the application phase. The following output illustrates this type of re-start.

C:\DataIntegration\Scripts>mload -b < tran.ml
**** 22:22:57 UTY2414 BRIEF option is enabled.
     ========================================================================
     =          MultiLoad Utility    Release MLOD.13.00.00.000              =
     =          Platform WIN32                                              =
     ========================================================================
     =     Copyright 1990-2009, Teradata Corporation. ALL RIGHTS RESERVED.  =
     ========================================================================
**** 22:22:57 UTY2411 Processing start date: SUN AUG 09, 2009
     ========================================================================
     =          Logon/Connection                                            =
     ========================================================================
0001 .logtable test.tranmllog;
0002 .LOGON test/testuser,;
**** 22:23:00 UTY8400 Teradata Database Release: 12.00.01.24
**** 22:23:00 UTY8400 Teradata Database Version: 12.00.01.24
**** 22:23:00 UTY8400 Default character set: ASCII
**** 22:23:00 UTY8400 Current RDBMS has interval support
**** 22:23:00 UTY8400 Current RDBMS has UDT support
**** 22:23:00 UTY8400 Maximum supported buffer size: 1M
**** 22:23:00 UTY8400 Data Encryption supported by RDBMS server
**** 22:23:00 UTY6211 A successful connect was made to the RDBMS.
**** 22:23:00 UTY6217 Logtable 'test.tranmllog' has been created.

0003 .begin import
        mload tables test.tran
        checkpoint 20000;
     ========================================================================
     =          Processing MultiLoad Statements                             =
     ========================================================================

0004 .layout tranlayout;
0005        .FIELD STORE * varchar(10);
0006        .field TRAN_NO * varchar  (10);
0007        .field TRAN_DATE * varchar  (10);
0008        .field TRAN_TIME * varchar  (10);
0009        .field TRAN_TYPE * varchar (1);
0010        .field TRAN_TERMINAL * varchar (3);
0011        .field CLERK * varchar (10);
0012        .field CUST_NO * varchar (10);
0013        .field NAME * varchar (30);
0014        .field TOTAL_SALE * varchar (14);
0015        .field SALES_TAX * varchar (14);
0016        .field DISCOUNT * varchar (14);
0017        .field DISC_PCT * varchar (10);
0018        .field DISC_TAKEN_FLAG * varchar (1);
0019        .field CASH_TENDERED * varchar (14);
0020        .field CHANGE_GIVEN * varchar (14);
0021        .field CHARGE_TENDERED* varchar (14);
0022        .field CHECK_NUMBER * varchar (20);
0023        .field CHECK_TENDERED * varchar (14);
0024        .field BANKCARD_TENDERED * varchar (14);
0025        .field PURCHASE_ORDER_NUM * varchar (20);

0026 .dml label insdml;
0027 insert into test.tran.*;

0028 .import infile C:\DataIntegration\Data\TRAN2.DAT
             format vartext '|'
             layout tranlayout
             apply insdml;
0029 .end mload;
     ========================================================================
     =          MultiLoad Initial Phase                                     =
     ========================================================================
**** 22:23:00 UTY0829 Options in effect for this MultiLoad import task:
     .       Sessions:    One session per available amp.
     .       Checkpoint:  20000 records.
     .       Tenacity:    4 hour limit to successfully connect load sessions.
     .       Errlimit:    No limit in effect.
     .       AmpCheck:    In effect for apply phase transitions.
**** 22:23:01 UTY0815 MLOAD session(s) connected: 2.
     ========================================================================
     =          MultiLoad Acquisition Phase                                 =
     ========================================================================
**** 22:23:04 UTY1812 A checkpoint is being initiated because 20000 output recor
ds have been
     sent to the RDBMS.
**** 22:23:05 UTY0827 A checkpoint has been taken, recording that input record 2
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:23:06 UTY1812 A checkpoint is being initiated because 40000 output recor
ds have been
     sent to the RDBMS.
**** 22:23:07 UTY0827 A checkpoint has been taken, recording that input record 4
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:23:07 UTY1812 A checkpoint is being initiated because 60000 output recor
ds have been
     sent to the RDBMS.
**** 22:23:08 UTY0827 A checkpoint has been taken, recording that input record 6
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:23:10 UTY0826 A checkpoint has been taken, recording that end of file ha
s been reached
     for IMPORT 1 of this MultiLoad Import task.
**** 22:23:10 UTY1803 Import processing statistics
     .                                       IMPORT  1     Total thus far
     .                                       =========     ==============
     Candidate records considered:........       67817.......       67817
     Apply conditions satisfied:..........       67817.......       67817
     Candidate records not applied:.......           0.......           0
     Candidate records rejected:..........           0.......           0
     ========================================================================
     =          MultiLoad Application Phase                                 =
     ========================================================================
**** 22:24:19 UTY0817 MultiLoad submitting the following request:
     EXEC MLOAD test.tran;
**** 22:24:19 UTY0805 RDBMS failure, 2828: Request was rolled back during system
 recovery.
**** 22:24:19 UTY1804 Re-starting this MultiLoad task because The RDBMS crashed.

     ========================================================================
     =          MultiLoad Initial Phase                                     =
     ========================================================================
**** 22:24:19 UTY0829 Options in effect for this MultiLoad import task:
     .       Sessions:    One session per available amp.
     .       Checkpoint:  20000 records.
     .       Tenacity:    4 hour limit to successfully connect load sessions.
     .       Errlimit:    No limit in effect.
     .       AmpCheck:    In effect for apply phase transitions.
**** 22:24:20 UTY0815 MLOAD session(s) connected: 2.
     ========================================================================
     =          MultiLoad Application Phase                                 =
     ========================================================================
**** 22:24:27 UTY0818 Statistics for table test.tran:
        Inserts:        67817
        Updates:            0
        Deletes:            0
     ========================================================================
     =          MultiLoad Task Cleanup                                      =
     ========================================================================
**** 22:24:27 UTY0821 Error table test.ET_tran is EMPTY, dropping table.
**** 22:24:27 UTY0821 Error table test.UV_tran is EMPTY, dropping table.
**** 22:24:27 UTY0825 Error table statistics for:

     Target table 1: test.tran
     Number of Rows  Error Table Name
     ==============  ========================================================
                  0  test.ET_tran
                  0  test.UV_tran

**** 22:24:28 UTY0822 MultiLoad processing complete for this MultiLoad import ta
sk.
     ========================================================================
     =          MultiLoad Task Complete                                     =
     ========================================================================

0030 .logoff;
     ========================================================================
     =          Logoff/Disconnect                                           =
     ========================================================================
**** 22:24:29 UTY6216 The restart log table has been dropped.
**** 22:24:29 UTY6212 A successful disconnect was made from the RDBMS.
**** 22:24:29 UTY2410 Total processor time used = '1.70313 Seconds'
     .       Start : 22:22:57 - SUN AUG 09, 2009
     .       End   : 22:24:29 - SUN AUG 09, 2009
     .       Highest return code encountered = '0'.

Restarts after client failure

A Multiload job may be restarted after it is interrupted by a failure of the Multiload process or the client processing platform. This type of er-ror may be a system error or can be a programming error or data er-ror. The system can be restarted by correcting the problem and re-running the Multiload job. The following output is an example of a re-start for a failure of a Multiload process during the acquisition phase.

C:\DataIntegration\Scripts>mload -b < tran.ml
**** 22:35:51 UTY2414 BRIEF option is enabled.
     ========================================================================
     =          MultiLoad Utility    Release MLOD.13.00.00.000              =
     =          Platform WIN32                                              =
     ========================================================================
     =     Copyright 1990-2009, Teradata Corporation. ALL RIGHTS RESERVED.  =
     ========================================================================
**** 22:35:51 UTY2411 Processing start date: SUN AUG 09, 2009
     ========================================================================
     =          Logon/Connection                                            =
     ========================================================================
0001 .logtable test.tranmllog;
0002 .LOGON test/testuser,;
**** 22:35:53 UTY8400 Teradata Database Release: 12.00.01.24
**** 22:35:53 UTY8400 Teradata Database Version: 12.00.01.24
**** 22:35:53 UTY8400 Default character set: ASCII
**** 22:35:53 UTY8400 Current RDBMS has interval support
**** 22:35:53 UTY8400 Current RDBMS has UDT support
**** 22:35:53 UTY8400 Maximum supported buffer size: 1M
**** 22:35:53 UTY8400 Data Encryption supported by RDBMS server
**** 22:35:54 UTY6211 A successful connect was made to the RDBMS.
**** 22:35:54 UTY6217 Logtable 'test.tranmllog' has been created.

0003 .begin import
        mload tables test.tran
        checkpoint 20000;
     ========================================================================
     =          Processing MultiLoad Statements                             =
     ========================================================================

0004 .layout tranlayout;
0005        .FIELD STORE * varchar(10);
0006        .field TRAN_NO * varchar  (10);
0007        .field TRAN_DATE * varchar  (10);
0008        .field TRAN_TIME * varchar  (10);
0009        .field TRAN_TYPE * varchar (1);
0010        .field TRAN_TERMINAL * varchar (3);
0011        .field CLERK * varchar (10);
0012        .field CUST_NO * varchar (10);
0013        .field NAME * varchar (30);
0014        .field TOTAL_SALE * varchar (14);
0015        .field SALES_TAX * varchar (14);
0016        .field DISCOUNT * varchar (14);
0017        .field DISC_PCT * varchar (10);
0018        .field DISC_TAKEN_FLAG * varchar (1);
0019        .field CASH_TENDERED * varchar (14);
0020        .field CHANGE_GIVEN * varchar (14);
0021        .field CHARGE_TENDERED* varchar (14);
0022        .field CHECK_NUMBER * varchar (20);
0023        .field CHECK_TENDERED * varchar (14);
0024        .field BANKCARD_TENDERED * varchar (14);
0025        .field PURCHASE_ORDER_NUM * varchar (20);

0026 .dml label insdml;
0027 insert into test.tran.*;

0028 .import infile C:\DataIntegration\Data\TRAN2.DAT
             format vartext '|'
             layout tranlayout
             apply insdml;
0029 .end mload;
     ========================================================================
     =          MultiLoad Initial Phase                                     =
     ========================================================================
**** 22:35:54 UTY0829 Options in effect for this MultiLoad import task:
     .       Sessions:    One session per available amp.
     .       Checkpoint:  20000 records.
     .       Tenacity:    4 hour limit to successfully connect load sessions.
     .       Errlimit:    No limit in effect.
     .       AmpCheck:    In effect for apply phase transitions.
**** 22:35:54 UTY0815 MLOAD session(s) connected: 2.
     ========================================================================
     =          MultiLoad Acquisition Phase                                 =
     ========================================================================
**** 22:35:57 UTY1812 A checkpoint is being initiated because 20000 output reco
ds have been
     sent to the RDBMS.
**** 22:35:58 UTY0827 A checkpoint has been taken, recording that input record
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:35:59 UTY1812 A checkpoint is being initiated because 40000 output reco
ds have been
     sent to the RDBMS.
**** 22:36:00 UTY0827 A checkpoint has been taken, recording that input record
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:36:01 UTY1812 A checkpoint is being initiated because 60000 output reco
ds have been
     sent to the RDBMS.

The gateway sessions associated with the Multiload job will remain ac-tive for the gateway timeout time (20 minutes default). These sessions can be killed using the gtwglobal utility in the same manner that was described for the Fastload utility.

The interrupted Multiload job can be restarted by re-executing the job. The following output is from a re-execution of the example Multiload job.

C:\DataIntegration\Scripts>mload -b < tran.ml
**** 22:42:35 UTY2414 BRIEF option is enabled.
     ========================================================================
     =          MultiLoad Utility    Release MLOD.13.00.00.000              =
     =          Platform WIN32                                              =
     ========================================================================
     =     Copyright 1990-2009, Teradata Corporation. ALL RIGHTS RESERVED.  =
     ========================================================================
**** 22:42:35 UTY2411 Processing start date: SUN AUG 09, 2009
     ========================================================================
     =          Logon/Connection                                            =
     ========================================================================
0001 .logtable test.tranmllog;
0002 .LOGON test/testuser,;
**** 22:42:38 UTY8400 Teradata Database Release: 12.00.01.24
**** 22:42:38 UTY8400 Teradata Database Version: 12.00.01.24
**** 22:42:38 UTY8400 Default character set: ASCII
**** 22:42:38 UTY8400 Current RDBMS has interval support
**** 22:42:38 UTY8400 Current RDBMS has UDT support
**** 22:42:38 UTY8400 Maximum supported buffer size: 1M
**** 22:42:38 UTY8400 Data Encryption supported by RDBMS server
**** 22:42:38 UTY6211 A successful connect was made to the RDBMS.
**** 22:42:38 UTY6210 Logtable 'test.tranmllog' indicates that a restart is in p
rogress.

0003 .begin import
        mload tables test.tran
        checkpoint 20000;
     ========================================================================
     =          Processing MultiLoad Statements                             =
     ========================================================================

0004 .layout tranlayout;
0005        .FIELD STORE * varchar(10);
0006        .field TRAN_NO * varchar  (10);
0007        .field TRAN_DATE * varchar  (10);
0008        .field TRAN_TIME * varchar  (10);
0009        .field TRAN_TYPE * varchar (1);
0010        .field TRAN_TERMINAL * varchar (3);
0011        .field CLERK * varchar (10);
0012        .field CUST_NO * varchar (10);
0013        .field NAME * varchar (30);
0014        .field TOTAL_SALE * varchar (14);
0015        .field SALES_TAX * varchar (14);
0016        .field DISCOUNT * varchar (14);
0017        .field DISC_PCT * varchar (10);
0018        .field DISC_TAKEN_FLAG * varchar (1);
0019        .field CASH_TENDERED * varchar (14);
0020        .field CHANGE_GIVEN * varchar (14);
0021        .field CHARGE_TENDERED* varchar (14);
0022        .field CHECK_NUMBER * varchar (20);
0023        .field CHECK_TENDERED * varchar (14);
0024        .field BANKCARD_TENDERED * varchar (14);
0025        .field PURCHASE_ORDER_NUM * varchar (20);

0026 .dml label insdml;
0027 insert into test.tran.*;

0028 .import infile C:\DataIntegration\Data\TRAN2.DAT
             format vartext '|'
             layout tranlayout
             apply insdml;
0029 .end mload;
     ========================================================================
     =          MultiLoad Initial Phase                                     =
     ========================================================================
**** 22:42:38 UTY0829 Options in effect for this MultiLoad import task:
     .       Sessions:    One session per available amp.
     .       Checkpoint:  20000 records.
     .       Tenacity:    4 hour limit to successfully connect load sessions.
     .       Errlimit:    No limit in effect.
     .       AmpCheck:    In effect for apply phase transitions.
**** 22:42:39 UTY0815 MLOAD session(s) connected: 2.
     ========================================================================
     =          MultiLoad Acquisition Phase                                 =
     ========================================================================
**** 22:42:40 UTY1806 This MultiLoad Import task is re-starting from checkpoint
taken at
     record 40000 of IMPORT 1.
**** 22:42:40 UTY1812 A checkpoint is being initiated because 60000 output recor
ds have been
     sent to the RDBMS.
**** 22:42:41 UTY0827 A checkpoint has been taken, recording that input record 6
0000 has been
     processed for IMPORT 1 of this MultiLoad Import task.
**** 22:42:43 UTY0826 A checkpoint has been taken, recording that end of file ha
s been reached
     for IMPORT 1 of this MultiLoad Import task.
**** 22:42:43 UTY1803 Import processing statistics
     .                                       IMPORT  1     Total thus far
     .                                       =========     ==============
     Candidate records considered:........       67817.......       67817
     Apply conditions satisfied:..........       67817.......       67817
     Candidate records not applied:.......           0.......           0
     Candidate records rejected:..........           0.......           0
     ========================================================================
     =          MultiLoad Application Phase                                 =
     ========================================================================
**** 22:42:57 UTY0818 Statistics for table test.tran:
        Inserts:        67817
        Updates:            0
        Deletes:            0
     ========================================================================
     =          MultiLoad Task Cleanup                                      =
     ========================================================================
**** 22:42:57 UTY0821 Error table test.ET_tran is EMPTY, dropping table.
**** 22:42:57 UTY0821 Error table test.UV_tran is EMPTY, dropping table.
**** 22:42:57 UTY0825 Error table statistics for:

     Target table 1: test.tran
     Number of Rows  Error Table Name
     ==============  ========================================================
                  0  test.ET_tran
                  0  test.UV_tran

**** 22:42:58 UTY0822 MultiLoad processing complete for this MultiLoad import ta
sk.
     ========================================================================
     =          MultiLoad Task Complete                                     =
     ========================================================================

0030 .logoff;
     ========================================================================
     =          Logoff/Disconnect                                           =
     ========================================================================
**** 22:42:59 UTY6216 The restart log table has been dropped.
**** 22:42:59 UTY6212 A successful disconnect was made from the RDBMS.
**** 22:42:59 UTY2410 Total processor time used = '0.8125 Seconds'
     .       Start : 22:42:35 - SUN AUG 09, 2009
     .       End   : 22:42:59 - SUN AUG 09, 2009
     .       Highest return code encountered = '0'.

A Multiload job may also be restarted if interrupted during the application phase.

It is important to note that the restart log and error tables must be available. If they are not the Multiload job can not be restarted or rerun. In this case the table must be reloaded from backup or source because they are unusable.

The following output is an example of a Multiload job restarted after an interruption during the application phase.

C:\DataIntegration\Scripts>mload -b < tran.ml
**** 22:53:17 UTY2414 BRIEF option is enabled.
     ========================================================================
     =          MultiLoad Utility    Release MLOD.13.00.00.000              =
     =          Platform WIN32                                              =
     ========================================================================
     =     Copyright 1990-2009, Teradata Corporation. ALL RIGHTS RESERVED.  =
     ========================================================================
**** 22:53:17 UTY2411 Processing start date: SUN AUG 09, 2009
     ========================================================================
     =          Logon/Connection                                            =
     ========================================================================
0001 .logtable test.tranmllog;
0002 .LOGON test/testuser,;
**** 22:53:19 UTY8400 Teradata Database Release: 12.00.01.24
**** 22:53:19 UTY8400 Teradata Database Version: 12.00.01.24
**** 22:53:19 UTY8400 Default character set: ASCII
**** 22:53:19 UTY8400 Current RDBMS has interval support
**** 22:53:19 UTY8400 Current RDBMS has UDT support
**** 22:53:19 UTY8400 Maximum supported buffer size: 1M
**** 22:53:19 UTY8400 Data Encryption supported by RDBMS server
**** 22:53:19 UTY6211 A successful connect was made to the RDBMS.
**** 22:53:19 UTY6210 Logtable 'test.tranmllog' indicates that a restart is in p
rogress.

0003 .begin import
        mload tables test.tran
        checkpoint 20000;
     ========================================================================
     =          Processing MultiLoad Statements                             =
     ========================================================================

0004 .layout tranlayout;
0005        .FIELD STORE * varchar(10);
0006        .field TRAN_NO * varchar  (10);
0007        .field TRAN_DATE * varchar  (10);
0008        .field TRAN_TIME * varchar  (10);
0009        .field TRAN_TYPE * varchar (1);
0010        .field TRAN_TERMINAL * varchar (3);
0011        .field CLERK * varchar (10);
0012        .field CUST_NO * varchar (10);
0013        .field NAME * varchar (30);
0014        .field TOTAL_SALE * varchar (14);
0015        .field SALES_TAX * varchar (14);
0016        .field DISCOUNT * varchar (14);
0017        .field DISC_PCT * varchar (10);
0018        .field DISC_TAKEN_FLAG * varchar (1);
0019        .field CASH_TENDERED * varchar (14);
0020        .field CHANGE_GIVEN * varchar (14);
0021        .field CHARGE_TENDERED* varchar (14);
0022        .field CHECK_NUMBER * varchar (20);
0023        .field CHECK_TENDERED * varchar (14);
0024        .field BANKCARD_TENDERED * varchar (14);
0025        .field PURCHASE_ORDER_NUM * varchar (20);

0026 .dml label insdml;
0027 insert into test.tran.*;

0028 .import infile C:\DataIntegration\Data\TRAN2.DAT
             format vartext '|'
             layout tranlayout
             apply insdml;
0029 .end mload;
     ========================================================================
     =          MultiLoad Initial Phase                                     =
     ========================================================================
**** 22:53:19 UTY0829 Options in effect for this MultiLoad import task:
     .       Sessions:    One session per available amp.
     .       Checkpoint:  20000 records.
     .       Tenacity:    4 hour limit to successfully connect load sessions.
     .       Errlimit:    No limit in effect.
     .       AmpCheck:    In effect for apply phase transitions.
**** 22:53:20 UTY0815 MLOAD session(s) connected: 2.
     ========================================================================
     =          MultiLoad Application Phase                                 =
     ========================================================================
**** 22:53:25 UTY0818 Statistics for table test.tran:
        Inserts:        67817
        Updates:            0
        Deletes:            0
     ========================================================================
     =          MultiLoad Task Cleanup                                      =
     ========================================================================
**** 22:53:25 UTY0821 Error table test.ET_tran is EMPTY, dropping table.
**** 22:53:26 UTY0821 Error table test.UV_tran is EMPTY, dropping table.
**** 22:53:26 UTY0825 Error table statistics for:

     Target table 1: test.tran
     Number of Rows  Error Table Name
     ==============  ========================================================
                  0  test.ET_tran
                  0  test.UV_tran

**** 22:53:27 UTY0822 MultiLoad processing complete for this MultiLoad import ta
sk.
     ========================================================================
     =          MultiLoad Task Complete                                     =
     ========================================================================

0030 .logoff;
     ========================================================================
     =          Logoff/Disconnect                                           =
     ========================================================================
**** 22:53:27 UTY6216 The restart log table has been dropped.
**** 22:53:27 UTY6212 A successful disconnect was made from the RDBMS.
**** 22:53:27 UTY2410 Total processor time used = '0.671875 Seconds'
     .       Start : 22:53:17 - SUN AUG 09, 2009
     .       End   : 22:53:27 - SUN AUG 09, 2009
     .       Highest return code encountered = '0'.

Rerun

Multiload jobs that are interrupted before the application phase starts or after the application phase ends may be rerun. Jobs that are inter-rupted during the application phase must be restarted.

To rerun an entire Multiload job the following actions must be taken before the job is rerun:

  1. Use BTEQ to execute RELEASE MLOAD command for each target table in Multiload script
  2. Drop work table for each target table
  3. Drop acquisition error table for each target table
  4. Drop application error table for each target table
  5. Drop restart log table

These actions can be executed by a BTEQ SQL script that is run as part of a Multiload script or run by an external job scheduling system.

The following is an example of a SQL script that will allow this example to be rerun.

.logon test/testuser,test

database test;

release mload tran;
drop table uv_tran;
drop table et_tran;
drop table wt_tran;
drop table log_tran;

.logoff
.quit

This script can be executed as part of a Multiload script that executes this script with BTEQ and then executes the Multiload job as illustrated in the following Multiload script.

.system 'bteq < c:\DataIntegration\scripts\tranreset.sql';
.run file c:\DataIntegration\scripts\tranload.ml;

.logoff;

This script can always be run when the intent is to run the entire Multi-load job.