Monday, April 6, 2009
Learn to use the new feature that collects database performance statistics and metrics for analysis and tuning, shows the exact time spent in the database, and even saves session information
When you have a database performance problem, what is the first thing you do to address it? One common approach is to see if a pattern exists: Answering questions such as "Is the same problem recurrent?", "Does it occur during a specific time period?", and "Is there a link between two problems?" will almost always lead to a better diagnosis.
As a DBA you probably have invested in a third-party or homegrown tool to collect elaborate statistics during database operation and derive performance metrics from them. In a crisis, you access those metrics for comparisons to the present. Replaying these past events can shed light on current problems, so continuously capturing relevant statistics becomes important for performance analysis.
For some time, Oracle's solution in this area has been its built-in tool, Statspack. While it can prove invaluable in certain cases, it often lacks the robustness required by performance troubleshooting exercises. Oracle Database 10g offers a significant improvement: the Automatic Workload Repository (AWR). The AWR installs along with the database and captures not only statistics, but the derived metrics as well.
A Quick Test Drive
AWR capability is best explained quickly by the report it produces from collected statistics and metrics, by running the script awrrpt.sql in the $ORACLE_HOME/rdbms/admin directory. This script, in its look and feel, resembles Statspack; it shows all the AWR snapshots available and asks for two specific ones as interval boundaries. It produces two types of output: text format, similar to that of the Statspack report but from the AWR repository, and the default HTML format, complete with hyperlinks to sections and subsections, providing quite a user-friendly report. Run the script and take a look at the report now to get an idea about capabilities of the AWR.
Now let's explore how AWR is designed and structured. Basically, AWR is an Oracle built-in tool that collects performance related statistics and derives performance metrics from them to track a potential problem. Unlike Statspack, snapshots are collected automatically every hour by a new background process called MMON and its slave processes. To save space, the collected data is automatically purged after 7 days. Both the snapshot frequency and retention time can be modified by the user. To see the present settings, you could use:
select snap_interval, retention
+00000 01:00:00.0 +00007 00:00:00.0
This SQL shows that the snapshots are taken every hour and the collections are retained 7 seven days. To change the settings--say, for snapshot intervals of 20 minutes and a retention period of two days--you would issue the following. The parameters are specified in minutes.
interval => 20,
retention => 2*24*60
AWR uses several tables to store the collected statistics, all stored under the SYS schema in the new special tablespace named SYSAUX, and named in the format WRM$_* and WRH$_*. The former type stores metadata information such as the database being examined and the snapshots taken, and the latter type holds the actual collected statistics. (As you might have guessed, H stands for "historical" and M stands for "metadata.") There are several views with the prefix DBA_HIST_ built upon these tables, which can be used to write your own performance diagnosis tool. The names of the views directly relate to the table; for example, the view DBA_HIST_SYSMETRIC_SUMMARY is built upon the table WRH$_SYSMETRIC_SUMMARY.
The AWR history tables capture a lot more information than Statspack, including tablespace usage, filesystem usage, even operating system statistics. A complete list of these tables can be seen from the data dictionary through:
select view_name from user_views where view_name like 'DBA\_HIST\_%' escape '\';
The view DBA_HIST_METRIC_NAME defines the important metrics the AWR collects, the groups to which they belong, and the unit in which they are collected. For example, here is one record (in vertical format):
DBID : 4133493568
GROUP_ID : 2
GROUP_NAME : System Metrics Long Duration
METRIC_ID : 2075
METRIC_NAME : CPU Usage Per Sec
METRIC_UNIT : CentiSeconds Per Second
It shows that a metric "CPU Usage Per Sec" is measured in units of "CentiSeconds Per Second" and belongs to a metric group "System Metrics Long Duration." This record can be joined with other tables such as DBA_HIST_SYSMETRIC_SUMMARY to get the activity, as in:
select begin_time, intsize, num_interval, minval, maxval, average, standard_deviation sd
from dba_hist_sysmetric_summary where metric_id = 2075;
BEGIN INTSIZE NUM_INTERVAL MINVAL MAXVAL AVERAGE SD
----- ---------- ------------ ------- ------- -------- ----------
11:39 179916 30 0 33 3 9.81553548
11:09 180023 30 21 35 28 5.91543912
... and so on ...
Here we see how the CPU time was consumed in centi-seconds. The standard deviation adds to our analysis by helping ascertain whether the average figure reflects the actual workload. In the first records, the average is 3 centi-seconds in CPU per second elapsed, but the standard deviation is 9.81, meaning the average of 3 is not reflective of the workload. In the second example, the value 28, with a standard deviation of 5.9, is more representative. This type of information trends help understanding the effects of several environmental parameters on performance metrics.
Using the Statistics
So far we have seen what AWR collects; now let's see what it does with the data.
Most performance problems do not exist in isolation, but rather leave tell-tale signs that will lead to the eventual root cause of the problem. Let's use a typical tuning exercise: You notice that the system is slow and decide to look into the waits. Your examination reveals that the "buffer busy wait" is very high. What could be the problem? There are several possibilities: there could be a monotonically increasing index, a table so packed that a single block is asked to be loaded to memory very quickly, or some other factors. In any case, first you want identify the segment in question. If it's an index segment, you could decide to rebuild it; change it to a reverse key index; or convert it to a hash-partitioned index introduced in Oracle Database 10g. If it's a table, you could consider changing storage parameters to make it less dense or move it over to a tablespace with automatic segment space management.
Your plan of attack is generally methodical and usually based your knowledge of various events and your experience in dealing with them. Now imagine if the same thing were done by an engine - an engine that captures metrics and deduces possible plans based on pre-determined logic. Wouldn't your job be easier?
That engine, now available in Oracle Database 10g, is known as Automatic Database Diagnostic Monitor (ADDM). To arrive at a decision, ADDM uses the data collected by AWR. In the above discussion, ADDM can see that the buffer busy waits are occurring, pull the appropriate data to see the segments on which it occurs, evaluate its nature and composition, and finally offer solutions to the DBA. After each snapshot collection by AWR, the ADDM is invoked to examine the metrics and generate recommendations. So, in effect you have a full-time robotic DBA analyzing the data and generating recommendations proactively, freeing you to attend to more strategic issues.
To see the ADDM recommendations and the AWR repository data, use the new Enterprise Manager 10g console on the page named DB Home. To see the AWR reports, you can navigate to them from Administration, then Workload Repository, and then Snapshots. We'll examine ADDM in greater detail in a future installment.
You can also specify alerts to be generated based on certain conditions. These alerts, known as Server Generated Alerts, are pushed to an Advanced Queue, from where they can be consumed by any client listening to it. One such client is Enterprise Manager 10g, where the alerts are displayed prominently.
When you have a performance problem, what comes to mind first to reduce the response time? Obviously, you want to eliminate (or reduce) the root cause of the factor that adds to the time. How do you know where the time was spent--not waiting, but actually doing the work?
Oracle Database 10g introduces time models for identifying the time spent in various places. The overall system time spent is recorded in the view V$SYS_TIME_MODEL. Here is the query and its output.
DB time 58211645
DB CPU 54500000
background cpu time 254490000
sequence load elapsed time 0
parse time elapsed 1867816
hard parse elapsed time 1758922
sql execute elapsed time 57632352
connection management call elapsed time 288819
failed parse elapsed time 50794
hard parse (sharing criteria) elapsed time 220345
hard parse (bind mismatch) elapsed time 5040
PL/SQL execution elapsed time 197792
inbound PL/SQL rpc elapsed time 0
PL/SQL compilation elapsed time 593992
Java execution elapsed time 0
bind/define call elapsed time 0
Note the statistic named DB Time, which represents the time spent in the database since the instance startup. Run the sample workload and select the statistic value from the view again. The difference should represent the time spent in the database for that workload. After another round of tuning, perform the same analysis and that difference will show the change in DB Time after the tuning, which can be compared to first change to examine the effect of the tuning exercise on the database time.
In addition to the database time, the V$SYS_TIME_MODEL view shows a whole lot of other statistics, such as time spent in different types of parsing and even PL/SQL compilation.
This view shows the overall system times as well; however, you may be interested in a more granular view: the session level times. The timing stats are captured at the session level as well, as shown in the view V$SESS_TIME_MODEL, where all the stats of the current connected sessions, both active and inactive, are visible. The additional column SID specifies the SID of the sessions for which the stats are shown.
In previous releases, this type of analysis was impossible to get and the user was forced to guess or derive from a variety of sources. In Oracle Database 10g, getting this information is a snap.
Active Session History
The view V$SESSION in Oracle Database 10g has been improved; the most valuable improvement of them all is the inclusion of wait events and their duration, eliminating the need to see the view V$SESSION_WAIT. However, since this view merely reflects the values in real time, some of the important information is lost when it is viewed later. For instance, if you select from this view to check if any session is waiting for any non-idle event, and if so, the event in question, you may not find anything because the wait must have been over by the time you select it.
Enter the new feature Active Session History (ASH), which, like AWR, stores the session performance statistics in a buffer for analysis later. However, unlike AWR, the storage is not persistent in a table but in memory, and is shown in the view V$ACTIVE_SESSION_HISTORY. The data is polled every second and only the active sessions are polled. As time progresses, the old entries are removed to accommodate new ones in a circular buffer and shown in the view. To find out how many sessions waited for some event, you would use
select session_id||','||session_serial# SID, n.name, wait_time, time_waited
from v$active_session_history a, v$event_name n
where n.event# = a.event#
This command tells you the name of the event and how much time was spent in waiting. If you want to drill down to a specific wait event, additional columns of ASH help you with that as well. For instance, if one of the events the sessions waited on is buffer busy wait, proper diagnosis must identify the segments on which the wait event occurred. You get that from the ASH view column CURRENT_OBJ#, which can then be joined with DBA_OBJECTS to get the segments in question.
ASH also records parallel query server sessions, useful to diagnose the parallel query wait events. If the record is for a parallel query slave process, the SID of the coordinator server session is identified by QC_SESSION_ID column. The column SQL_ID records the ID of the SQL statement that produced the wait event, which can be joined with the V$SQL view to get the offending SQL statement. To facilitate the identification of the clients in a shared user environment like a web application, the CLIENT_ID column is also shown, which can be set by DBMS_SESSION.SET_IDENTIFIER.
Since ASH information is so valuable, wouldn't it be nice if it were stored in a persistent manner similar to AWR? Fortunately, it is; the information is flushed to the disk by the MMON slave to the AWR table, visible through the view DBA_HIST_ACTIVE_SESS_HISTORY.
Snapshots are collected automatically by default, but you can also collect them on demand. All AWR functionality has been implemented in the package DBMS_WORKLOAD_REPOSITORY. To take a snapshot, simply issue:
It immediately takes a snapshot, recorded in the table WRM$_SNAPSHOT. The metrics collected are for the TYPICAL level. If you want to collect more detailed statistics, you can set the parameter FLUSH_LEVEL to ALL in the above procedure. The stats are deleted automatically but can also be deleted manually by calling the procedure drop_snapshot_range().
A typical performance tuning exercise starts with a capturing a baseline set of metrics, making changes, and then taking another baseline set. These two sets can be compared to examine the effect of the changes made. In AWR, the same kind of analogy can be implemented for existing snapshots taken. Suppose a particularly resource intensive process named apply_interest ran between 1:00 and 3:00PM, corresponding to snapshot IDs 56 through 59. We could define a baseline named apply_interest_1 for these snapshots:
exec dbms_workload_repository.create_baseline (56,59,'apply_interest_1')
This action marks the snapshots 56 through 59 as part of a baseline named above. Checking for existing baselines:
select * from dba_hist_baseline;
DBID BASELINE_ID BASELINE_NAME START_SNAP_ID END_SNAP_ID
---------- ----------- -------------------- ------------- -----------
4133493568 1 apply_interest_1 56 59
After a few tuning steps, we can create another baseline--called, say apply_interest_2--and compare the metrics for only those snapshots related to these two baselines. Isolating snapshots to only a few sets like this helps in studying the effects of tuning on performance metrics. You can drop the baselines after the analysis using the procedure drop_baseline(); the snapshots will be preserved. Also, when the purge routine kicks in to delete the old snapshots, the ones related to baselines are not purged, allowing for further analysis.
This installment was intended to be merely an introduction to the very rudimentary aspects of the AWR. For a more complete coverage, see Oracle Database 10g documentation. Furthermore, an excellent treatise on AWR and ADDM can be found in the technical whitepaper The Self-Managing Database: Automatic Performance Diagnosis. In Week 18, you will learn more about ADDM and using it to solve real-life problems.
Happy Learning !
It has been a quiet day so far: no major problems in the database, no fires to put out. You are almost relaxed; it's a perfect day for catching up on important to-do tasks such as tuning RMAN tuning parameters or multiple block sizes.
Suddenly, a developer appears at your cubicle. His SQL query is taking a long time to run. Could you please, he asks, tune the query ASAP so it will "behave"?
Perhaps you relaxed too soon. Your original agenda was to spend some time making strategic decisions that will make your database better, faster, and more secure—such as ensuring the database is recoverable, enhancing underlying technology, or researching security updates. Instead, you will spend another day focusing on tactical activities such as SQL tuning, leaving little or no time for a strategic agenda.
As a strategic DBA, you want to free yourself from mundane chores and focus more on thought-provoking areas. Wouldn't it be nice to have an assistant DBA do them for you?
With Oracle Database 10g, you have one in the form of Automatic Database Diagnostic Monitor (ADDM), a sort of robotic DBA that tirelessly trolls through database performance statistics to identify bottlenecks, analyze SQL statements, and consequently offer various types of suggestions to improve performance, often in conjunction with other "advisors" such as the SQL Tuning Advisor. In this installment, you will get an overview of how this process works.
Automatic Database Diagnostic Monitor (ADDM)
In Week 6, you learned about Automatic Workload Repository (AWR), which collects detailed performance-related metrics from the database at regular intervals, known as snapshots. After each snapshot is taken, ADDM is invoked to thoroughly analyze the data and metrics deriving from the difference between snapshots, and then recommend necessary actions. As I mentioned earlier, after finding a problem, ADDM might in turn call other advisors (such as the SQL Tuning Advisor) to offer recommendations for improvement.
Instead of explaining this feature in words, let me show you exactly how it works. Suppose you are trying to diagnose an unexplained performance problem. In the example in our introduction, you knew which SQL statements needed to be tuned, or at least that SQL statements were the problem. In real life, however, you probably will not have such helpful information.
To perform a diagnosis in 10g, you would choose the snapshots in the relevant interval for further drill-down analysis. In Enterprise Manager 10g, from the Database home page, you would choose "Advisor Central" and then click on the "ADDM" link, which brings up a page similar to Figure 1.
In this page, you can create tasks to be analyzed by ADDM. You know that the performance problems occurred around 11PM, so choose the snapshots that fall in that range, indicated by "Period Start" and "Period End" values. You can also click on the camera icons to indicate start and stop snapshot intervals, as shown in red ellipses here. After choosing the interval, press the "OK" button, which brings up a page similar to that shown in Figure 2.
Here ADDM identifies two critical and related performance problems in the interval: some SQL statements are consuming significant CPU time, leading to a significant database slowdown. Based on the findings, ADDM recommends SQL tuning for these statements as highlighted in the figure.
If you click on the individual findings, ADDM displays more details. For example, clicking on the problem finding brings up a page similar to the one shown in Figure 3.
Here you can see the specific SQL statement causing this problem. ADDM recommends that you subject this SQL statement to a thorough analysis by SQL Tuning Advisor, as mentioned in the "Action" section. You can immediately run the task by clicking on the button next to it, which will invoke the SQL Tuning Advisor.
In Figure 2, you may have noticed a button named "View Report." In addition to providing the recommendation in individual web pages, ADDM can also create plain-text reports for a quicker one-stop analysis. Listing 1 shows the comprehensive recommendations made in our example plain-text report. Note how the report presents relevant details such as the SQL statement in question, its hash value, and so on. The SQL ID can be used for independent analysis in the SQL Tuning Advisor page in Enterprise Manager or via the command line. ADDM is invoked after every AWR snapshot is collected, so the recommendations based on the adjacent snapshots are available for viewing. Hence you will not have to create an ADDM task as shown above if the scope of analysis is just two adjacent snapshots. If you want to analyze between two snapshots that are not adjacent, you will need to create the ADDM task. Keep in mind that this is not all ADDM can do; there are many more analyses and recommendations available for memory management, segment management, redo/undo, and more, as you saw in previous installment. Because it would be impossible to describe the full spectrum of ADDM functionalities in this single brief article, we'll focus only on SQL Tuning Advisor here. Now let's see how it works.
Access Analysis with SQL Tuning Advisor
In a typical runtime optimization, the optimizer generates a set of possible access paths and chooses the least "expensive" among them based on object statistics. At that moment, however, it does not have the time to address whether the statement can be tuned, the statistics are stale, an index can be created, and so on. In contrast, the SQL Tuning Advisor can perform this "expert system" type of thinking. Essentially, the optimizer can answer the question: "Based on what's available, what's the best way to get results?", whereas SQL Tuning Advisor can answer the question, "Based on what the user wants, what else needs to be done to enhance performance?"
As you might expect, this "thinking" consumes resources such as CPU; hence the SQL Tuning Advisor works on SQL statements during a Tuning Mode, which can be run during off-peak times. This mode is indicated by setting the SCOPE and TIME parameters in the function while creating the tuning task. It's a good practice to run Tuning Mode during a low-activity period in the database so that regular users are relatively unaffected, leaving analysis for later.
The concept is best explained through an example. Take the case of the query that the developer brought to your attention, shown below.
select account_no from accounts where old_account_no = 11
This statement is not difficult to tune but for the sake of easier illustration, assume it is. There are two ways to fire up the advisor: using Enterprise Manager or plain command line.
First, let's see how to use it in command line. We invoke the advisor by calling the supplied package dbms_sqltune.
l_sql := 'select account_no from accounts where old_account_no = 11';
l_task_id := dbms_sqltune.create_tuning_task (
sql_text => l_sql,
user_name => 'ARUP',
scope => 'COMPREHENSIVE',
time_limit => 120,
task_name => 'FOLIO_COUNT'
This package creates and executes a tuning task named FOLIO_COUNT. Next, you will need to see the results of the execution of the task (that is, see the recommendations).
set serveroutput on size 999999
set long 999999
select dbms_sqltune.report_tuning_task ('FOLIO_COUNT') from dual;
The output is shown is Listing 2. Look at these recommendations carefully; the advisor says you can improve performance by creating an index on the column OLD_ACCOUNT_NO. Even better, the advisor calculated the cost of the query if the index were created, making the potential savings more definable and concrete.
Of course, considering the simplicity of this example, you would have reached the conclusion via manual examination as well. However, imagine how useful this tool would be for more complex queries where a manual examination may not be possible or is impractical.
Intermediate-Level Tuning: Query Restructuring
Suppose the query is a little bit more complex:
select account_no from accounts a
where account_name = 'HARRY'
and sub_account_name not in
( select account_name from accounts
where account_no = a.old_account_no and status is not null);
The advisor recommends the following:
1- Restructure SQL finding (see plan 1 in explain plans section)
The optimizer could not unnest the subquery at line ID 1 of the execution
Consider replacing "NOT IN" with "NOT EXISTS" or ensure that columns used
on both sides of the "NOT IN" operator are declared "NOT NULL" by adding
either "NOT NULL" constraints or "IS NOT NULL" predicates.
A "FILTER" operation can be very expensive because it evaluates the
subquery for each row in the parent query. The subquery, when unnested can
drastically improve the execution time because the "FILTER" operation is
converted into a join. Be aware that "NOT IN" and "NOT EXISTS" might
produce different results for "NULL" values.
This time the advisor did not recommend any structural changes such as indexes, but rather intelligently guessed the right way to tune a query by replacing NOT IN with NOT EXISTS. ecause the two constructs are similar but not identical, the advisor gives the rationale for the change and leaves the decision to the DBA or application developer to decide whether this recommendation is valid for the environment.
Advanced Tuning: SQL Profiles
As you may know, the optimizer decides on a query execution plan by examining the statistics present on the objects referenced in the query and then calculating the least-cost method. If a query involves more than one table, which is typical, the optimizer calculates the least-cost option by examining the statistics of all the referenced objects—but it does not know the relationship among them.
For example, assume that an account with status DELINQUENT will have less than $1,000 as balance. A query that joins the tables ACCOUNTS and BALANCES will report fewer rows if the predicate has a clause filtering for DELINQUENT only. The optimizer does not know this complex relationship—but the advisor does; it "assembles" this relationship from the data and stores it in the form of a SQL Profile. With access to the SQL Profile, the optimizer not only knows the data distribution of tables, but also the data correlations among them. This additional information allows the optimizer to generate a superior execution plan, thereby resulting in a well-tuned query.
SQL Profiles obviate the need for tuning SQL statements by manually adding query hints to the code. Consequently, the SQL Tuning Advisor makes it possible to tune packaged applications without modifying code—a tremendous benefit.
The main point here is that unlike objects statistics, a SQL Profile is mapped to a query, not an object or objects. Another query involving the same two tables—ACCOUNTS and BALANCES—may have a different profile. Using this metadata information on the query, Oracle can improve performance.
If a profile can be created, it is done during the SQL Tuning Advisor session, where the advisor generates the profile and recommends that you "Accept" it. Unless a profile is accepted, it's not tied to a statement. You can accept the profile at any time by issuing a statement such as the following:
task_name => 'FOLIO_COUNT',
name => 'FOLIO_COUNT_PROFILE'
description => 'Folio Count Profile',
category => 'FOLIO_COUNT');
This command ties the profile named FOLIO_COUNT_PROFILE generated earlier by the advisor to the statement associated with the tuning task named FOLIO_COUNT described in the earlier example. (Note that although only the advisor, not the DBA, can create a SQL Profile, only you can decide when to use it.)
You can see created SQL Profiles in the dictionary view DBA_SQL_PROFILES. The column SQL_TEXT shows the SQL statement the profile was assigned to; the column STATUS indicates if the profile is enabled. (Even if it is already tied to a statement, the profile must be enabled in order to affect the execution plan.)
Using ADDM and SQL Tuning Advisor
In addition to the three cases described above, SQL Tuning Advisor also identifies any objects with missing statistics referenced in the query. Thus, the advisor performs four distinct types of tasks:
- Checks if objects have valid, usable statistics for proper optimization
- Attempts to rewrite queries for better performance and suggests rewriting
- Checks the access path to see if performance could be improved by adding additional structures such as indexes and materialized views
- Creates SQL profiles and attaches them to specific queries.
- Reactive Tuning: Your application suddenly starts to perform poorly. Using ADDM, you can drill the problem down to a SQL statement or a set of statements, as shown in the section on ADDM. Following the recommendation of ADDM, you can launch SQL Tuning Advisor and correct the problem.
- Proactive Tuning: The application behaves acceptably well; however, you want to make sure that all necessary maintenance tasks are performed and know if queries can be tuned even more. You would fire up SQL Tuning Advisor in the standalone mode to identify possible tuning alternatives.
- Development Tuning: While code is being tested in development there are many opportunities to tune the query, as opposed to wthe QA or production phases. You can use the command-line version of the advisor to tune individual SQL statements before they are finalized in development.
The previous example was deliberately formulated to illustrate how to use SQL Tuning Advisor in command-line mode, which is very useful for scripting these tasks proactively. In most cases, however, you will need to perform tuning in response to problems reported by an end user. Enterprise Manager 10g comes in handy in those cases. A few weeks ago (Week 13) you were introduced to the revamped Enterprise Manager interface. Here's how you would use it to diagnose and tune SQL: From the Database home page, click on the link "Advisor Central" at the bottom of the screen, which launches the page containing all the advisors. Next, click on "SQL Tuning Advisor" at the top of the screen as shown in Figure 4.
You have just launched the SQL Tuning Advisor. Choose "Top SQL" from the next page as shown in Figure 5.
This action launches a page similar to the one shown in Figure 6, where a graph containing the various wait classes are traced along a time dimension.
A gray rectangular area within a red ellipse puts the focus on the graph. Reposition the rectangle by mouse-dragging it to a location where the CPU wait is high (as shown in the figure). The lower part of the page will display the relevant SQL statements in that interval, as shown in Figure 7.
As you can see, the SQL statement shown at the top (enclosed by the red ellipse) has the highest activity with maximum CPU consumption. Click on the statement ID to see details of the statement, which will bring up a screen as shown in Figure 8.
In the figure, you can see the exact SQL statement that caused the CPU consumption in that time period. You can click on the button "Run SQL Tuning Advisor" (marked in the figure) to run the advisor. This brings up a screen similar to the one shown in Figure 9.
In the advisor scheduler, you can determine the type of task and how much analysis should be done. For example, in the above figure, I have chosen "comprehensive" analysis and that the advisor is to be run immediately. After the advisor finishes you can see its recommendation, as shown in Figure 10.
This process I just described is similar to what you have seen in the command-line version; however, the flow is more reflective of a real-life scenario in which you have reacted to a problem, drilled down to its cause, and accepted recommendations about how to fix it.
ADDM is a powerful tool that has the "brains" to analyze performance metrics and offer recommendations based on best practices and accepted methodologies professed by seasoned Oracle professionals, all automatically. This functionality can tell the DBA not only what happened and why, but most important, what to do next.
For more information about ADDM and SQL Tuning Advisor, see the Technical Whitepapers Oracle Database 10g: The Self-Managing Database and The Self-Managing Database: Guided Application & SQL Tuning, Chapter 10 of the Oracle Database 2 Day DBA manual, and Chapters 6 and 13 of the Oracle Database Performance Tuning Guide.
Happy Learning !
Whether you're a new or veteran DBA, you've almost certainly seen an error similar to this one at least once:
ORA-04031: unable to allocate 2216 bytes of shared memory ("shared pool"... ...
or this one:
ORA-04031: unable to allocate XXXX bytes of shared memory
("large pool","unknown object","session heap","frame")
or perhaps this one:
ORA-04031: unable to allocate bytes of shared memory ("shared pool",
"unknown object","joxlod: init h", "JOX: ioc_allocate_pal")
The cause of the first error is obvious: the memory allocated to the shared pool is insufficient for answering the user request. (In some cases the cause may not be the size of the pool itself, but rather the fragmentation that results from excessive parsing due to non-usage of bind variables—a favorite topic of mine; but let's stay focused on the issue at hand right now.) The other errors derive from inadequate space in the large pool and Java pool respectively.
You need to resolve these error conditions without any application-related changes. What are your options? The question is how to divide available memory among all the pools required by the Oracle instance.
How Do You Split the Pie?
The System Global Area (SGA) of an Oracle instance, as you know, comprises several memory areas, including the buffer cache, shared pool, Java pool, large pool, and redo log buffers. These pools occupy fixed amounts of memory in the operating system's memory space; their sizes are specified by the DBA in the initialization parameter file.
The four pools—db block buffer cache, shared pool, Java pool, and large pool—occupy almost all the space inside the SGA. (Relative to the other areas, the redo log buffer does not occupy much space and is inconsequential to our discussion here.) You, as the DBA, must ensure that their respective memory allocations are sufficient.
Suppose you decide that the values of these pools should be 2GB, 1GB, 1GB, and 1GB respectively. You would set the following initialization parameters to mandate the sizes of the pools for the database instance.
db_cache_size = 2g
shared_pool_size = 1g
large_pool_size = 1g
java_pool_size = 1g
Now, take a close look at these parameters. Honestly, are these values accurate?
I'm sure you have your doubts. In real life, no one can specify these pools to an exact science—they depend too heavily on the processing inside the database and the nature of processing changes from time to time.
Here's an example scenario. Say you have a typical, "mostly" OLTP database and have dedicated less memory for the buffer cache than you would have for a purely OLTP one (few of which exist anymore). One day, your users turn loose some very large full table scans for end-of-the-day reporting. Oracle9i Database gives you the ability to change the allocation online, but because the total physical memory available is limited, you decide to pull something away from the large pool and the Java pool:
alter system set db_cache_size = 3g scope=memory;
alter system set large_pool_size = 512m scope=memory;
alter system set java_pool_size = 512m scope=memory;
This solution works fine for a while, but then the nightly RMAN jobs—which use the large pool—begin and the pool immediately falls short. Again, you come to the rescue by supplementing the large pool with some memory from the db cache.
The RMAN jobs complete, but then a batch program that uses Java extensively fires up, and consequently, you start to see Java pool-related errors. So, you reallocate the pools (again) to accommodate the demands on the Java pool and db cache:
alter system set db_cache_size = 2G scope=memory;
alter system set large_pool_size = 512M scope=memory;
alter system set java_pool_size = 1.5G scope=memory;
The next morning, the OLTP jobs come back online and the cycle repeats all over again!
One alternative to this vicious cycle is to set the maximum requirements of each pool permanently. By doing that, however, you may allocate a total SGA more than the available memory—thereby increasing the risk of swapping and paging when the allocation is less than adequate for each pool. The manual reallocation method, although impractical, looks pretty good right now.
Another alternative is to set the values to acceptable minimums. However, when demand goes up and memory is not available, performance will suffer.
Note that in all these examples the total memory allocated to SGA remained the same, while the allocation among the pools changed based on immediate requirements. Wouldn't it be nice if the RDBMS were to automatically sense the demand from users and redistribute memory allocations accordingly?
The Automatic Shared Memory Management feature in Oracle Database 10g does exactly that. You can decide the total size of the SGA and then set a parameter named SGA_TARGET that decides the total size of the SGA. The individual pools within the SGA will be dynamically configured based on the workload. A non-zero value of the parameter SGA_TARGET is all that is needed to enable the automatic memory allocation.
Setting up Automatic Shared Memory Management
Let's see how this works. First, determine the total size of the SGA. You can estimate this value by determining how much memory is allocated right now.
SQL> select sum(value)/1024/1024 from v$sga;
The current total size of the SGA right now is approximately 500MB, which will become the value of SGA_TARGET. Next, issue the statement:
alter system set sga_target = 500M scope=both;
This approach obviates the need to set individual values for the pools; thus, you'll need to make their values zero in the parameter file or remove them completely.
shared_pool_size = 0
large_pool_size = 0
java_pool_size = 0
db_cache_size = 0
Recycle the database to make the values take effect.
This manual process can also be implemented via Enterprise Manager 10g. From the database home page, choose the "Administration" tab and then "Memory Parameters." For manually configured memory parameters, the button marked "Enable" will be displayed, along with the values of all manually configured pools. Click the "Enable" button to turn Automatic Shared Memory Management on. Enterprise Manager does the rest.
After the automatic memory allocations are configured, you can check their sizes with the following:
SQL> select current_size from v$buffer_pool;
SQL> select pool, sum(bytes)/1024/1024 Mbytes from v$sgastat group by pool;
java pool 4
large pool 4
shared pool 148
As you can see, all the pools were automatically configured from the total target size of 500MB. (See Figure 1.) The buffer cache size is 340MB, Java pool is 4MB, large pool is 4MB, and shared pool is 148MB. Together they total (340+4+4+148=) 496MB, approximately the same size as the target SGA of 500MB.
Now suppose the host memory available to Oracle is reduced from 500MB to 300MB, meaning we have to reduce the size of the total SGA. We can reflect that change by reducing the target SGA size.
alter system set sga_target = 300M scope=both;Checking the pools now, we can see that:
SQL> select current_size from v$buffer_pool;The total size occupied is 240+4+4+44 = 296MB, close to the target of 300MB. Notice how the pools were automatically reallocated when the SGA_TARGET was changed, as shown in Figure 2.
SQL> select pool, sum(bytes)/1024/1024 Mbytes from v$sgastat group by pool;
java pool 4
large pool 4
shared pool 44
The size of the pools is dynamic. Based on the workload, the pools will expand to accommodate the increase in demand or shrink to accommodate the expansion in another pool. This expansion or contraction occurs automatically without the DBA's intervention, unlike the example in the opening of this article. Returning to that scenario for a moment, assume that after the initial allocation the RMAN job starts, indicating the need for a larger large pool; the large pool will expand from 4MB to 40MB to accommodate the demand. This additional 36MB will be carved out of the db buffers and the db block buffers will shrink, as shown in Figure 3.
The changed sizes of the pools are based on the workload on the system, so the pools needn't be sized for the worst-case scenario—they will automatically adjust to the growth in demand. Furthermore, the total size of the SGA is always within the maximum value specified by SGA_TARGET, so there is no risk of blowing the memory requirement out of proportion (which will lead to paging and swapping). You can dynamically increase the SGA_TARGET to the absolute maximum specified by adjusting the parameter SGA_MAX_SIZE.
Which Pools are Not Affected?
Some pools in SGA are not subject to dynamic resizing, and must be specified explicitly. Notable among them are the buffer pools for nonstandard block sizes and the non-default ones for KEEP or RECYCLE. If your database has a block size of 8K, and you want to configure 2K, 4K, 16K, and 32K block-size pools, you must set them manually. Their sizes will remain constant; they will not shrink or expand based on load. You should consider this factor when using multiple-size buffer, KEEP, and RECYCLE pools. In addition, log buffer is not subject to the memory adjustment—the value set in the parameter log_buffer is constant, regardless of the workload. ( In 10g, a new type of pool can also be defined in the SGA: Streams pool, set with parameter streams_pool_size. This pool is also not subject to automatic memory tuning.)
This gives rise to an interesting question. What if you need a non-default block size pool yet want to manage the other pools automatically?
If you specify any of these non-auto-tunable parameters (such as db_2k_cache_size), their total size is subtracted from the SGA_TARGET value to calculate the automatically tuned parameter values so that the total size of the SGA remains constant . For instance, imagine that the values look like this:
sga_target = 500Mand the rest of the pool parameters are unset. The 2KB buffer pool of 50MB leaves 450MB for the auto-tuned pools such as the default block size buffer pool (db_cache_size), shared pool, Java pool, and large pool. When the non-tunable parameter such as the 2KB block size pool is dynamically adjusted in such a way that the tunable portion's size is affected, the tunable portion is readjusted. For example, raising the value of db_2k_cache_size to 100MB from 50MB leaves only 400MB for the tunable parameters. So the tunable pools such as shared, large, Java, and default buffer pools shrink automatically to reduce their total size to 400MB from 450MB, as shown in Figure 4.
db_2k_cache_size = 50M
But what if you have sufficient memory available or the risks described above may not be that pronounced? If so, you can turn off automatic resizing by not specifying the parameter SGA_TARGET in the parameter file, by setting it to zero in the file, or by changing it to zero dynamically with ALTER SYSTEM. When SGA_TARGET is set to zero, the current values of the pools are automatically set to their parameter.
Using Enterprise Manager
You can also use Enterprise Manager 10g to manipulate these parameters. From the database home page, click the hyperlink "Memory Parameters," which will show you a screen similar to the one in Figure 5.
Note the items circled in red: The database is running in Automatic Shared Memory Management mode and the total size is 564MB, the same value specified in the parameter SGA_TARGET. You can modify it here and click on the Apply button to accept the values; the tunable parameters will automatically adjust.
Pool Size (MB) Buffer 404 Java 4 Large 4 Shared 148
Looking at the above you might conclude that the Java and large pools are a bit inadequate at 4MB; this value will definitely need to be increased at runtime. Therefore, you may want to make sure the pools at least start with higher values—say, 8MB and 16MB respectively. You can do that by explicitly specifying the value of these pools in the parameter file or dynamically using ALTER SYSTEM as shown below.
alter system set large_pool_size = 16M; Checking the pools now, you can see:
alter system set java_pool_size = 8M;
SQL> select pool, sum(bytes)/1024/1024 Mbytes from v$sgastat group by pool; The reallocation of the pools is shown below:
java pool 8
large pool 16
shared pool 148
SQL> select current_size from v$buffer_pool;
Pool Size (MB) Buffer 388 Java 8 Large 16 Shared 148
Note how the Java and large pools have been reconfigured to 8MB and 16MB respectively, and that to keep the total SGA under 600MB, the buffer pool has reduced to 388MB from 404MB. Of course, these pools are still governed by Automatic Shared Memory Management—their sizes will shrink or expand based on demand. The values you have specified explicitly put a lower limit on the pool size; they will never sink below this limit.
Happy Learning !