What is an event handler?
It is a workflow that responds programmatically to events that occur on a package level, container level, task level or change of variable value
What is MSDTC?
The built-in transaction support in SSIS makes use of the Distributed Transaction Coordinator (MSDTC) service which must be running for the transaction to function. MSDTC also allows you to perform distributed transactions; e.g. updating a SQL Server database and an Oracle database in the same transaction.
What are transactions in ssis?
Transactions are used to maintain data integrity by ensuring that a database remains in a consistent state even if a package fails. They accomplish this by binding the database actions that tasks perform into units, which either succeed or fail together. Because all the database actions that are part of a transaction are either committed or rolled back together, you can ensure that data remains in a consistent state.
What are the transaction options supported by ssis ?
i. Not Supported The container does not start a transaction and will not join an existing
transaction even if it has already started.
ii. Supported The container does not start a transaction but will join an existing transaction
that has already started.
iii. Required The container starts a new transaction. If an existing transaction has already
been started by the parent container, the container will join it.
What are checkpoints?
Checkpoints enable SSIS to restart failed packages from the point of failure instead of rerunning the whole package. If a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package is run successfully, the checkpoint file is deleted and then re-created the next time the package is run.
Give 3 benefits of Checkpoints?
i. Avoid repeating the downloading and uploading of large files.
ii. Avoid repeating the loading of large amounts of data.
iv. Avoid repeating the aggregation of values.
How do you configure a check point?
i. Create a text file and save it to be used as the checkpoint file.
ii. Right Click anywhere in the control flow area within the package and chose the property from the pop up menu.
iii. Under the checkpoints section at the top of the property window, navigate and select the file created in the first step to be used as the checkpoint file.
iv. Click to the right of checkpoint usage to select either ifexists to use the checkpoint file if it exists or choose always which means the package cannot run without the checkpoint file.
v. Click the right input box of savecheckpoint and select true from the drop down menu.
vi. Right click each containers in the package and choose the property option and perform vii
vii. In the Properties list, click FailPackageOnFailure, and then, select True.
How do you configure a package or container to be executed as a transaction?
i. For a package level transaction, right click anywhere inside the control panel area within the package and click on properties to open up the package properties. For a container, right click the container and click on properties to open the container properties window.
ii. In the transaction section of the properties window, select the type of transaction by clicking the input box next to transactionoption and selecting the type from the drop down menu menu.
iii. Click the isolationLevel input box and select what type of locks or nolocks to be used on tables during the transaction.
How can you diagnose cause of error when a package failed and there are checkpoints declared on containers in the package?
—Use the logging to trace the error. Checkpoint provides the id of the task that fails so if logging is done on this tasks, the error can easily be traced by reading it from the audit database.
What are the Isolation Levels for transactions?
Serizeable
ReadCommitted
ReadUncommitted
RepeatableRead
Unspecified
Chaos
Snapshot
What are Transaction Isolation Levels?
The transaction isolation level determines the duration that locks are held.
READ UNCOMMITTED – reads do not acquire share locks and they don’t wait on locks. This is often referred to as a dirty read because you can read modified data that hasn’t been committed yet and it could get rolled back after you read it.
READ COMMITTED – reads acquire share locks and wait on any data modified by a transaction in process.
REPEATABLE READ – same as READ COMMITTED but in addition share locks are retained on rows read for the duration of the transaction. In other words any row that is read cannot be modified by another connection until the transaction commits or rolls back.
SERIALIZABLE – same as REPEATABLE READ but in addition no other connection can insert rows if the new rows would appear in a SELECT statement already issued. In other words if you issue a select statement in a transaction using the SERIALIZABLE isolation level you will get the same exact result set if you issue the select statement again within the same transaction.
SNAPSHOT- where you set ALLOW_SNAPSHOT_ISOLATION ON at the database level and any transaction that explicitly sets the transaction isolation level to snapshot will not acquire share locks and will not wait on any locks. Rather, you will get the committed version of all rows at the time the transaction begins.
Both of the above SQL Server 2005 enhancements are made possible by maintaining committed versions of rows in tempdb (referred to as the version store). When a read encounters a row that has been modified and not yet committed, it retrieves the appropriate latest committed row from the version store. The maintenance and traversing of the version store is performed by SQL Server automatically; there are no code changes required.
What is the difference between a transaction and a checkpoint?
In transaction, a failure results in rolling back all the previous successful tasks but when a checkpoint is used, the package does not roll back previously successful tasks. When the package is rerun, the package that has transaction has to run from the beginning but a package with a checkpoint will only restart from the location where the error occurred.
What is an ETL strategy?
Every SSIS package is unique and has a unique logic, however each SSIS package should adhere to the following.
1. Error Handling Mechanism
a. Row redirect via error output
b. on failure precedence constraint
c. Work Flows in the follow events such as on error or on failure
2. An Auditing Mechanism
a. via SSIS logging
b. Custom Logging
3. Communication Mechanism
a. Send mail task
b. Notify operator task
c. Send Mail via sql server agent
4. Maintenance Mechanism
after loading large data to a destination such as data mart or data warehouse, maintenance can be achieved by using the default maintenance task or custom TSQL code wrapped inside of Execute TSQL task
5. Transactional Type
a. Transactions have to be carefully chosen based on the amount of data the package is manipulating
b. Checkpoint is also a good option when the execution of a package and its components is lengthy
6. Deployment Method
When deploying a package, a destination should be identified weather it is file system of nature or sql server destination. Security / permission / accessibility should be discussed when choosing deployment destination.
7. Configuration Method
a. Choose an xml configuration file
b. Choosing a sql server configuration- In this section, choosing the dynamic properties of a package that might vary from an environment to another must be done correctly prior to deployment. For example, to testing or production environment.
8. Security Mechanism
A package could should either a built in security such as encryption with the user name and password or use the security of the deployment location
9. Execution Mechanism
A package could be executed in several ways using
a. dtexec
b. dtexecui
c. scheduling the package
d. scheduling a cmd command
e. Scheduling a TSQL (SP_cmdshell)
10. SSIS Optimization best practices and techniques
a. Optimization of the control flow
b. Optimization of the data flow (Optimizing the E, T, L)
11. Mechanism or Logic that will safe guard the destination and the integrity of its data.
12. Each package should have Annotation (All annotations shd be green in color and not more than 10 pixels font). Put important messages or notes in red.
13. The frequency of execution
What is the effect of transactions on redirect rows?
Redirect rows conflict with transactions because the data is being redirect to another output but the transaction tries to roll back which conflicts the tasks of the package.
What are the two natures of transformation?
1. Blocking:
a. Blocking, Fast Fr
F-Fuzzy look up
A -aggregate
S-Sort
T-Term extract
F-Fuzzy grow
R-Row sampling
b. Non blocking or streaming
c. Semi Blocking
derived column
merge
merge join
pivot
non pivot
2. Communication
a. Synchronous
b. Asynchronous
lineage id is the location a record occupies in a buffer when data is being moved from one place to another
What is the difference between control flow(Runtime engine) and data flow?
i. Control flow is an engine that manages the flow or execution of tasks and containers within a package. All the package executables are found in the control flow.
ii. Event handles responds to events that occur at the control flow level and not to a data flow level.
iii. SSIS logging also logs event occurring at the control flow level
iii. Transactions are configured at the control flow level
iv. Check points are configured at the control flow level.
v. Task and containers can be executed in parallel or in a sequence governed by precedence constraints.
v. Variables declared at control flow level are global and can be accessed by any task in the control flow.
vi. In the data flow, there are 3 main components, source adaptors, transformations and destination adaptors
vii. The green arrows connecting the data flow components are data pipeline
What types of precedence constraints are there in SSIS?
1. On failure
2. On success
3. On completion
What type of package configuration do we have?
xml
sql server
What are the different ways to execute an ssis package
i. DTEXEC
ii.dtexecui
iii. f5
iv. Sp_cmdshell
v. sql server agent or command.