Scheduling in PDI

Once you're finished designing your PDI jobs and transformations, you can arrange to run them at certain time intervals through the DI Server, or through your own scheduling mechanism (such as cron on Linux, and the Task Scheduler or the at command on Windows). The methods of operation for scheduling and scripting are different; scheduling through the DI Server is done through the Spoon graphical interface, whereas scripting using your own scheduler or executor is done by calling the pan or kitchen commands. This section explains all of the details for scripting and scheduling PDI content.
You can schedule your jobs through:
1.  Data Integration (DI) Server
2. Manual scripting through pan or kitchen commands

1. DI Server
This method is done through the Spoon graphical interface & is only available for Enterprise repository
After you design your job, the steps are as follows:
1. Open a job or transformation, then go to the Action menu and select Schedule.
Enter your configurations in the the Schedule a Transformation dialog box.



2.  Manual scripting through pan or kitchen commands

Command-Line Scripting Through Pan and Kitchen


Pan is the PDI command line tool for executing transformations.
Kitchen is the PDI command line tool for executing jobs.
You can use PDI's command line tools to execute PDI content from outside of Spoon. Typically you would use these tools in the context of creating a script or a cron job to run the job or transformation based on some condition outside of the realm of Pentaho software.

Pan


pan.sh - option = value arg1 arg2
pan.bat / option : value arg1 arg2
Switch
Purpose
rep
Enterprise or database repository name, if you are using one
user
Repository username
pass
Repository password
trans
The name of the transformation (as it appears in the repository) to launch
dir
The repository directory that contains the transformation, including the leading slash
file
If you are calling a local KTR file, this is the filename, including the path if it is not in the local directory
level
The logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing)
logfile
A local filename to write log output to
listdir
Lists the directories in the specified repository
listtrans
Lists the transformations in the specified repository directory
listrep
Lists the available repositories
exprep
Exports all repository objects to one XML file
norep
Prevents Pan from logging into a repository. If you have set the KETTLE_REPOSITORY, KETTLE_USER, and KETTLE_PASSWORD environment variables, then this option will enable you to prevent Pan from logging into the specified repository, assuming you would like to execute a local KTR file instead.
safemode
Runs in safe mode, which enables extra checking
version
Shows the version, revision, and build date
param
Set a named parameter in a name=value format. For example: -param:FOO=bar
listparam
List information about the defined named parameters in the specified transformation.
maxloglines
The maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows (default)
maxlogtimeout
The maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely (default)
sh pan.sh -rep=initech_pdi_repo -user=pgibbons -pass=lumburghsux -trans=TPS_reports_2011
pan.bat /rep:initech_pdi_repo /user:pgibbons /pass:lumburghsux /trans:TPS_reports_2011

Kitchen Syntax


Kitchen runs jobs, either from a PDI repository (database or enterprise), or from a local file. The syntax for the batch file and shell script are shown below. All Kitchen options are the same for both.
kitchen.sh - option = value arg1 arg2
kitchen.bat / option : value arg1 arg2
Switch
Purpose
rep
Enterprise or database repository name, if you are using one
user
Repository username
pass
Repository password
job
The name of the job (as it appears in the repository) to launch
dir
The repository directory that contains the job, including the leading slash
file
If you are calling a local KJB file, this is the filename, including the path if it is not in the local directory
level
The logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing)
logfile
A local filename to write log output to
listdir
Lists the directories in the specified repository
listjob
Lists the jobs in the specified repository directory
listrep
Lists the available repositories
export
Exports all linked resources of the specified job. The argument is the name of a ZIP file.
norep
Prevents Kitchen from logging into a repository. If you have set the KETTLE_REPOSITORY, KETTLE_USER, and KETTLE_PASSWORD environment variables, then this option will enable you to prevent Kitchen from logging into the specified repository, assuming you would like to execute a local KTR file instead.
version
Shows the version, revision, and build date
param
Set a named parameter in a name=value format. For example: -param:FOO=bar
listparam
List information about the defined named parameters in the specified job.
maxloglines
The maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows (default)
maxlogtimeout
The maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely (default)
sh kitchen.sh -rep=initech_pdi_repo -user=pgibbons -pass=lumburghsux -job=TPS_reports_2011
kitchen.bat /rep:initech_pdi_repo /user:pgibbons /pass:lumburghsux /job:TPS_reports_2011

Comments

Popular posts from this blog

Increase Java Memory For Pentaho Data Integration

Null pointer exception in tmap

Different Match Models in tMap with example