Oracle Database Architect: March 2008

Monday, March 31, 2008

Partition Maintenance (By Tom Kyte)

Starting in Oracle9i, another option was added to partition maintenance: the ability to maintain the global indexes during the partition operation using the UPDATE GLOBAL INDEXES clause. This means that as you drop a partition, split a partition, perform whatever operation necessary on a partition ¿ Oracle will perform the necessary modifications to the global index to keep it up to date. Since most partition operations will cause this global index invalidation to occur, this feature can be a boon to systems that need to provide continual access to the data. You¿ll find that you sacrifice the raw speed of the partition operation, but with the associated window of unavailability immediately afterward as you rebuild indexes, for a slower overall response time from the partition operation but coupled with 100 percent data availability. In short, if you have a data warehouse that cannot have downtime, but must support these common data warehouse techniques of sliding data in and out, then this feature is for you¿but you must understand the implications.

Revisiting our previous example, if our partition operations had used the UPDATE GLOBAL INDEXES clause when relevant (in this example, it would not be needed on the ADD PARTITION statement since the newly added partition would not have any rows in it)

ops$tkyte@ORA10G> alter table partitioned
2 exchange partition fy_2004
3 with table fy_2004
4 including indexes
5 without validation
6 UPDATE GLOBAL INDEXES
7 /
Table altered.

ops$tkyte@ORA10G> alter table partitioned
2 drop partition fy_2004
3 UPDATE GLOBAL INDEXES
4 /
Table altered.

ops$tkyte@ORA10G> alter table partitioned
2 add partition fy_2006
3 values less than ( to_date('01-jan-2007','dd-mon-yyyy') )
4 /
Table altered.

ops$tkyte@ORA10G> alter table partitioned
2 exchange partition fy_2006
3 with table fy_2006
4 including indexes
5 without validation
6 UPDATE GLOBAL INDEXES
7 /
Table altered.

we would have discovered the indexes to be perfectly valid and usable both during and after the operation:

ops$tkyte@ORA10G> select index_name, status from user_indexes;

INDEX_NAME STATUS
------------------------------ --------
FY_2006_IDX VALID
FY_2004_IDX VALID
PARTITIONED_IDX_GLOBAL VALID
PARTITIONED_IDX_LOCAL N/A

6 rows selected.

ops$tkyte@ORA10G> set autotrace on explain
ops$tkyte@ORA10G> select count(*)
2 from partitioned
3 where timestamp between sysdate-50 and sysdate;

COUNT(*)
----------
6750

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=9 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 INDEX (RANGE SCAN) OF 'PARTITIONED_IDX_GLOBAL' (INDEX) (Cost=9¿

But there is a tradeoff: we are performing the logical equivalent of INSERT and DELETE operations on the global index structures. When we drop a partition, we have to delete all of the global index entries that might be pointing to that that partition. When we did the exchange of a table with a partition, we had to delete all of the global index entries pointing to the original data and then insert all of the new ones that we just slid in there. So the amount of work performed by the ALTER commands was significantly increased.

In fact, using runstats and a slightly modified version of the preceding example, we can measure the amount of ¿extra¿ work performed to maintain the global indexes during the partition operation. We¿ll slide out FY_2004 and slide in FY_2006 as before, adding in the requisite index rebuild. This will be the sliding window implementation that causes the data to become unavailable due to the need to rebuild the global indexes. We¿ll also then slide out FY_2005 and slide in FY_2007 using the UPDATE GLOBAL INDEXES clause, to emulate the sliding window implementation that provides for full data availability, even during the partition operations. In that manner, we can measure the same operations using the different techniques and compare them. Our expected outcome will be that the first approach will consume fewer database resources and therefore perform ¿faster¿ but incur a measurable period of ¿downtime.¿ The second approach, while consuming more resources and perhaps taking longer overall, will not incur downtime. As far as the end users are concerned, their ability to work never ceased. They might have been processing a bit slower (since we were competing with them for resources), but they were still processing, and they never stopped.

So, if we take the earlier example, but also create an empty FY_2005 table like FY_2004, and a full FY_2007 table like FY_2006, we can measure the differences between the index rebuild approaches, starting first with the ¿less available approach¿:

exec runStats_pkg.rs_start;

alter table partitioned exchange partition fy_2004
with table fy_2004 including indexes without validation;

alter table partitioned drop partition fy_2004;

alter table partitioned add partition fy_2006
values less than ( to_date('01-jan-2007','dd-mon-yyyy') );

alter table partitioned exchange partition fy_2006
with table fy_2006 including indexes without validation;

alter index partitioned_idx_global rebuild;

exec runStats_pkg.rs_middle;
Here is the highly available UPDATE GLOBAL INDEXES approach:
alter table partitioned exchange partition fy_2005
with table fy_2005 including indexes without validation
update global indexes;

alter table partitioned drop partition fy_2005
update global indexes;

alter table partitioned add partition fy_2007
values less than ( to_date('01-jan-2008','dd-mon-yyyy') );

alter table partitioned exchange partition fy_2007
with table fy_2007 including indexes without validation
update global indexes;

exec runStats_pkg.rs_stop;

We might observe the following:
ops$tkyte@ORA10G> exec runStats_pkg.rs_stop;
Run1 ran in 81 hsecs
Run2 ran in 94 hsecs
run 1 ran in 86.17% of the time

Name Run1 Run2 Diff
...
STAT...CPU used when call star 39 59 20
...
STAT...redo entries 938 3,340 2,402
STAT...db block gets 1,348 5,441 4,093
STAT...session logical reads 2,178 6,455 4,277
...
LATCH.cache buffers chains 5,675 27,695 22,020
...
STAT...table scan rows gotten 97,711 131,427 33,716
STAT...undo change vector size 35,100 3,404,056 3,368,956
STAT...redo size 2,694,172 6,197,988 3,503,816

The index rebuild approach did run faster, both as observed by the elapsed time and the CPU time. This fact has caused many a DBA to pause and say, ¿Hey, I don¿t want to use UPDATE GLOBAL INDEXES¿it¿s slower.¿ That is too simplistic of a view, however. What you need to remember is that while the operations overall took longer, processing on your system was not necessarily interrupted. Sure, you as the DBA might be looking at your screen for a longer period of time, but the really important work that takes place on your system was still taking place. What you need to do is see if this tradeoff makes sense for you. If you have an eight-hour maintenance window overnight in which to load new data, then by all means, use the rebuild approach if that makes sense. However, if you have a mandate to be available continuously, then the ability to maintain the global indexes will be crucial.

Looking at the redo generated by each approach, we can see that the UPDATE GLOBAL INDEXES generated considerably more¿over 230 percent more¿and we would expect that to only go up as we add more and more global indexes to the table. The redo generated by the UPDATE GLOBAL INDEXES is unavoidable and cannot be turned off via NOLOGGING, since the maintenance of the global indexes is not a complete rebuild of their structure but more of an incremental ¿maintenance.¿ Additionally, since we are maintaining the live index structure, we must generate undo for that¿in the event the partition operation fails, we must be prepared to put the index back the way it was. And remember, undo is protected by redo itself, so some of the redo you see generated is from the index updates and some is from the rollback. Add another global index or two and you would reasonably expect these numbers to increase.

So, UPDATE GLOBAL INDEXES is an option that allows you to trade off availability for resource consumption. If you have the need to provide continuous availability, it will be the option for you. But you will have to understand the ramifications and size other components of your system appropriately. Specifically, many data warehouses have been crafted over time to use bulk, direct path operations, bypassing undo generation and, when permitted, redo generation as well. Using UPDATE GLOBAL INDEXES cannot bypass either of those two elements. You¿ll need to examine the rules you used to size these components before using this feature, so you can assure yourself it can work on your system.

Friday, March 28, 2008

Locking in Oracle – Concepts and Analysis ( From Tom Kyte)

Here is a small example showing how this happens. We will use three V$ tables in order to see how this works; V$TRANSACTION, V$SESSION, and V$LOCK. V$TRANSACTION contains an entry for every active transaction. V$LOCK contains an entry for all locks held as well as locks being waited on. V$SESSION shows us the sessions logged in. We'll start by starting a transaction in one session and looking at the state of the system at that point:

tkyte@TKYTE816> update dept set deptno = deptno+10;
4 rows updated.

tkyte@TKYTE816> select username,
2 v$lock.sid,
3 trunc(id1/power(2,16)) rbs,
4 bitand(id1,to_number('ffff','xxxx'))+0 slot,
5 id2 seq,
6 lmode,
7 request
8 from v$lock, v$session
9 where v$lock.type = 'TX'
10 and v$lock.sid = v$session.sid
11 and v$session.username = USER
12 /

USERNAME SID RBS SLOT SEQ LMODE REQUEST
-------- ---------- ---------- ---------- ---------- ---------- ----------
TKYTE 8 2 46 160 6 0

tkyte@TKYTE816> select XIDUSN, XIDSLOT, XIDSQN
2 from v$transaction
3 /

XIDUSN XIDSLOT XIDSQN
---------- ---------- ----------
2 46 160

The interesting things to note here are:

ƒá The LMODE is 6 in the V$LOCK table and the request is 0. If you refer to the
definition of the V$LOCK table in the Oracle Server Reference, you will find that LMODE=6 is an exclusive lock. A value of 0 in the request means we are not making a request ¡V we have the lock.
ƒá There is only one row in this table. This V$LOCK table is more of a queuing table than a lock table. Many people expect there would be four rows in V$LOCK since we have four rows locked. What you must remember however is that Oracle does not store a master list of every row locked anywhere. To find out if a row is locked, we must go to that row.
ƒá I took the ID1 and ID2 columns, and performed a bit of bit manipulation on them.
Oracle needed to save three 16bit numbers, but only had two columns in order to do it. So, the first column ID1 holds two of these numbers. By dividing by 2^16 with
trunc(id1/power(2,16)) rbs and by masking out the high bits with
bitand(id1,to_number('ffff','xxxx'))+0 slot, I am able to get the two numbers that are hiding in that one number back out.
ƒá The RBS, SLOT, and SEQ values match the V$TRANSACTION information. This is my transaction ID.

Now I'll start another session using the same user name, update some rows in EMP, and then try to update DEPT:

tkyte@TKYTE816> update emp set ename = upper(ename);
14 rows updated.

tkyte@TKYTE816> update dept set deptno = deptno-10;

I am now blocked in this session. If we run the V$ queries again, we see:

tkyte@TKYTE816> select username,
2 v$lock.sid,
3 trunc(id1/power(2,16)) rbs,
4 bitand(id1,to_number('ffff','xxxx'))+0 slot,
5 id2 seq,
6 lmode,
7 request
8 from v$lock, v$session
9 where v$lock.type = 'TX'
10 and v$lock.sid = v$session.sid
11 and v$session.username = USER
12 /

USERNAME SID RBS SLOT SEQ LMODE REQUEST
-------- ---------- ---------- ---------- ---------- ---------- ----------
TKYTE 8 2 46 160 6 0
TKYTE 9 2 46 160 0 6
TKYTE 9 3 82 163 6 0

tkyte@TKYTE816> select XIDUSN, XIDSLOT, XIDSQN
2 from v$transaction
3 /

XIDUSN XIDSLOT XIDSQN
---------- ---------- ----------
3 82 163
2 46 160

What we see here is that a new transaction has begun ¡V (3,82,163) is the transaction ID.
It has two rows in V$LOCK this time. One row represents the locks that it owns (where LMODE=6). It also has a row in there that shows a REQUEST with a value of 6. This is a request for an exclusive lock. The interesting thing to note here is that the RBS/SLOT/SEQ values of this request row are the transaction ID of the holder of the lock. We can easily see that the transaction with SID=8 is blocking the transaction with SID=9. Now, if we commit in SID = 8 the above changes:

tkyte@TKYTE816> select username,
2 v$lock.sid,
3 trunc(id1/power(2,16)) rbs,
4 bitand(id1,to_number('ffff','xxxx'))+0 slot,
5 id2 seq,
6 lmode,
7 request, block
8 from v$lock, v$session
9 where v$lock.type = 'TX'
10 and v$lock.sid = v$session.sid
11 and v$session.username = USER
12 /

USERNAME SID RBS SLOT SEQ LMODE REQUEST
-------- ---------- ---------- ---------- ---------- ---------- ----------
TKYTE 9 3 82 163 6 0

tkyte@TKYTE816> select XIDUSN, XIDSLOT, XIDSQN
2 from v$transaction
3 /

XIDUSN XIDSLOT XIDSQN
---------- ---------- ----------
3 82 163

that request row has gone ¡V it disappeared the instant the other session gave up its
lock. That request row was the queuing mechanism. The database is able to wake up the blocked sessions the instant the transaction is completed. Note that the above gives us a very easy way to see blockers and waiters:

tkyte@TKYTE816> select
(select username from v$session where sid=a.sid) blocker,
2 a.sid,
3 ' is blocking ',
4 (select username from v$session where sid=b.sid) blockee,
5 b.sid
6 from v$lock a, v$lock b
7 where a.block = 1
8 and b.request > 0
9 and a.id1 = b.id1
10 and a.id2 = b.id2
11 /

BLOCKER SID 'ISBLOCKING' BLOCKEE SID
-------- ---------- ------------- -------- ----------
TKYTE 8 is blocking TKYTE 9

simply by doing a self-join of V$LOCK with itself (I ran this query before committing the session with SID=8).

2) exclusive lock -- I updated a row. no one else can update it until I commit. I have an X lock on that row and only one person at a time can have an X lock. an X lock provides serialization to a resource.

A shared lock -- when I update a table, I take a shared lock on the DEFINITION of the table. Everyone else can do that as well (more then one session can get a shared lock on the table definition). So, more than one person at a time can update the table. If you wanted to ALTER the table, you would need an X lock on the defintion. You cannot get an X lock when there are shared locks so you wait until there are no shared locks.

Thursday, March 27, 2008

Optimistic Locking Using ORA_ROWSCN

Optimistic Locking Using ORA_ROWSCN

Starting with Oracle 10g Release 1, you have the option to use the built-in ORA_ROWSCN function. It works very much like the version column technique described previously, but it can be performed automatically by Oracle¿you need no extra column in the table and no extra update/maintenance code to update this value.

ORA_ROWSCN is based on the internal Oracle system clock, the SCN. Every time you commit in Oracle, the SCN advances (other things can advance it as well, but it only advances; it never goes back). The concept is identical to the previous methods in that you retrieve ORA_ROWSCN upon data retrieval, and you verify it has not changed when you go to update. The only reason I give it more than passing mention is that unless you created the table to support the maintenance of ORA_ROWSCN at the row level, it is maintained at the block level. That is, by default many rows on a single block will share the same ORA_ROWSCN value. If you update a row on a block with 50 other rows, then they will all have their ORA_ROWSCN advanced as well. This would almost certainly lead to many false positives, whereby you believe a row was modified that in fact was not. Therefore, you need to be aware of this fact and understand how to change the behavior.

To see the behavior and then change it, we¿ll use the small DEPT table again:

ops$tkyte@ORA10G> create table dept

  2  (deptno, dname, loc, data,

  3   constraint dept_pk primary key(deptno)

  4  )

  5  as

  6  select deptno, dname, loc, rpad('*',3500,'*')

  7    from scott.dept;

Table created.

Now we can inspect what block each row is on (it is safe to assume in this case they are in the same file, so a common block number indicates they are on the same block). I was using an 8KB block size with a row width of about 3,550 bytes, so I am expecting there to be two rows per block for this example:

ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;

    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 ACCOUNTING          20972   34676029

        20 RESEARCH            20972   34676029

        30 SALES               20973   34676029

        40 OPERATIONS          20973   34676029

And sure enough, that is what we observe in this case. So, let¿s update the row where DEPTNO = 10 on block 20972:

ops$tkyte@ORA10G> update dept

  2     set dname = lower(dname)

  3   where deptno = 10;

1 row updated.

ops$tkyte@ORA10G> commit;

Commit complete.

What we¿ll observe next shows the consequences of ORA_ROWSCN being tracked at the block level. We modified and committed the changes to a single row, but the ORA_ROWSCN values of both of the rows on block 20972 have been advanced:

ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;

    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 accounting          20972   34676046

        20 RESEARCH            20972   34676046

        30 SALES               20973   34676029

        40 OPERATIONS          20973   34676029

It would appear to anyone else that had read the DEPTNO=20 row that it had been modified, even though it was not. The rows on block 20973 are 'safe'¿we didn¿t modify them, so they did not advance. However, if we were to update either of them, both would advance. So the question becomes how to modify this default behavior. Well, unfortunately, we have to re-create the segment with ROWDEPENDENCIES enabled.

Row dependency tracking was added to the database with Oracle9i in support of advanced replication, to allow for better parallel propagation of changes. Prior to Oracle 10g, its only use was in a replication environment, but starting in Oracle 10g we can use it to implement an effective optimistic locking technique with ORA_ROWSCN. It will add 6 bytes of overhead to each row (so it is not a space saver compared to the do-it-yourself version column) and that is, in fact, why it requires a table re-create and not just a simple ALTER TABLE: the physical block structure must be changed to accommodate this feature.

Let¿s rebuild our table to enable ROWDEPENDENCIES. We could use the online rebuild capabilities in DBMS_REDEFINITION (another supplied package) to do this, but for something so small, we¿ll just start over:

ops$tkyte@ORA10G> drop table dept;

Table dropped.

ops$tkyte@ORA10G> create table dept

  2  (deptno, dname, loc, data,

  3   constraint dept_pk primary key(deptno)

  4  )

  5  ROWDEPENDENCIES

  6  as

  7  select deptno, dname, loc, rpad('*',3500,'*')

  8    from scott.dept;

Table created.

ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;

    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 ACCOUNTING          21020   34676364

        20 RESEARCH              21020   34676364

        30 SALES                   21021   34676364

        40 OPERATIONS          21021   34676364

We¿re back where we were before: four rows on two blocks, all having the same initial ORA_ROWSCN value. Now when we update DEPTNO=10

ops$tkyte@ORA10G> update dept

  2     set dname = lower(dname)

  3   where deptno = 10;

1 row updated.

ops$tkyte@ORA10G> commit;

Commit complete.

we should observe the following upon querying the DEPT table:

ops$tkyte@ORA10G> select deptno, dname,

  2         dbms_rowid.rowid_block_number(rowid) blockno,

  3             ora_rowscn

  4    from dept;

    DEPTNO DNAME             BLOCKNO ORA_ROWSCN

---------- -------------- ---------- ----------

        10 accounting          21020   34676381

        20 RESEARCH          21020   34676364

        30 SALES                 21021   34676364

        40 OPERATIONS       21021   34676364

The only modified ORA_ROWSCN at this point belongs to DEPTNO = 10, exactly what we wanted. We can now rely on ORA_ROWSCN to detect row-level changes for us.

Welcome to OraDBArchitect Blob!!

Hi,

I am very happy to welcome you all to this BLOB OraDBArchiect....!!

Best Wishes
RK

Oracle Database Architect