Friday, October 31, 2008

Monitoring on an Apps 11i Web Tier Server Proactively

When asked how many were proactively monitoring their access_log. Not many people will put their hands up.

There is a wealth of information in the access_log that can be mined to give information about user behaviour, possible user or system problems, as well as having the potential to give valuable performance information.


Here are just three examples where you can quickly answer important questions about your system with very little effort:


1. How long does it take 9iAS to serve requests?

The main performance information is the time taken to serve a request. This is not included in the access_log by default (I have raised enhancement request 5349693 to include this data), but can easily be added by adding the following line to the httpd.conf file via an AutoConfig customization (see Metalink Note 270519.1):

LogFormat "%h %l %u %t \"%r\" %>s %b %T"

This line can be added anywhere in the file, but would sensibly be placed after the existing LogFormat entries. This directive adds an extra column into the access_log to show the
time in seconds it took from receiving a request to sending back a
response. Valid log formats are described in the Apache documentation.


2. Am I getting any server errors?

The access_log includes the HTTP status code of the request. They are listed in full in RFC 2616 but the codes we would be immediately concerned about are any in the 400 or 500 range.

For example:

  • Status 500 (internal server error) may typically be seen for a JServ request and often means the JVM has some kind of problem or has died.

    For example: This entry may indicate that the JServ JVM is not responding to any requests:

    192.168.1.10 - - [21/Jun/2006:13:25:30 +0100] "POST /oa_servlet/actions/processApplicantSearch HTTP/1.1" 500 0


  • Status 403 (forbidden) could typically be seen for oprocmgr

    For example: This entry in access_log may indicate a problem with system configuration (oprocmgr.conf): requests and often means there is a misconfiguration that needs to be resolved.
myserver - - [21/Jun/2006:13:25:30
+0100] "GET
/oprocmgr-service?cmd=Register&index=0&modName=JServ
&grpName=OACoreGroup&port=16000
HTTP/1.1" 403 226

3. Are users having problems accessing pages?

The status code of 200 means the request was successful, however a dash ( - ) for the "Bytes sent" column, normally means that the request hit the Apache timeout. You would also see a time taken to serve request above 300 seconds as the default Apache timeout is 5 minutes.

If you see this situation occurring regularly then your users are either navigating away from the browser page before it has rendered, or are likely to be getting a "white screen of death" in their browser window where it will appear to hang.


In this situation you need to identify why the requests are not being processed in good time, which is a large subject in itself.


Identifying these issues

Hopefully these examples will inspire you to want to analyse your access_log, but I hear you ask, "Where will I get the time?"

Luckily the access_log is a simple text file, so if you do not have commercial monitoring software you can use some "quick and dirty" scripts to report just the exceptions you are interesting in seeing. For example I would often use the following to scan access_logs for problems:



## Start of script
##
## Check for HTTP statuses in 400 or 500 range for JServ
## or PLSQL requests only
##
awk ' $9>=400 && $9<=599 { print $0 }' access_log* | grep -e "servlet" -e "\/pls\/" | grep -v .gif ## ## Check for requests taking more than 30 seconds to be returned ## awk ' $11>30 {print $0} ' access_log*
##
## This one is not an exception report, you need to manually check
## Look for when the JVMs are restarting
##
grep "GET /oprocmgr-service?cmd=Register" access_log*
##
## End of script

[Editor: Due to formatting restrictions on this blog, if you're cutting-and-pasting this script, you must manually join the line above ending with grep -e with the following line starting with "servlet" into a single unbroken line.]

Conclusion

Proactive monitoring of the access_log will help you to :-

  • Baseline your system performance
  • Identify user usage patterns
  • Highlight possible system or user problems
  • Identify areas with possible performance issues
  • Verify user reported problems
HAPPY LEARNING!

No comments:

Post a Comment

Thanks for you valuable comments !