EBS concurrent jobs(custom), which does simple selects for formatting, is taking significantly longer time (~ 8-9 mins)in Exadata Linux environment compared to AIX (< 2 mins). The spool output file was created under /applcsf/out and was about 568MB in size.
Here’s a sample query:
select
CASE_OWNER_DIV||’|’||
CASE_OWNER_LOGIN||’|’||
CASE_OWNER_NAME||’|’||
CASE_STATUS||’|’||
CASE_STATUS_REASON||’|’||
to_char(CASE_STATUS_DATE,’dd-mm-yyyy’)||’|’||
….
…..
……
to_char(WOA_ISSUE_DATE,’dd-mm-yyyy’)||’|’||
to_char(CASE_OWN_LAST_ACTION,’dd-mm-yyyy’)||’|’||
to_char(CLIENT_LAST_ACTION,’dd-mm-yyyy’)
from xxexp_marple
Trail & error:
We implemented the tried & tested approach like –
- Running GSS on underlying custom table xxexp_marple
- Compared the AIX & Linux plans( plan was same and it was doing just a full table scan with just about 300k records) & their corresponding wait events. We observed most time was spent on SQL* Net event.
- Ran GSS with Gather Auto options hourly instead of GSS on a weekly basis( basically making it same as AIX schedule)
- Ran SQL Tuning Advisor & use the recommended hints by modifying the underlying query:
/*+ OPT_PARAM(‘_optimizer_adaptive_plans’,’false’) */
/*+ OPT_PARAM(‘_px_adaptive_dist_method’,’off’) */
Breakthrough came when I manually changed the underlying select query to point the spool output to local file system (/tmp & /gluster) from current applcsf filesystem (which was in NFS). In both cases, the concurrent jobs completed in less than 2 mins, exactly what it should take ideally.
The above results finally helped me narrow down the issue and that it has to do with NFS filesystem.
Next was to determine what was the exact issue.
After researching several articles related to NFS, I came across a note (Doc ID 2480994.1) which states that having more than 8k-10k files in NFS filesystem subdirectories can cause performance issue. Next I went back to the applcsf/log & applcsf/log & found it had 3 months old data(fnd_concurrent_requests purge retention was 3 months) amounting to 96k & 8k respective file count. After moving the older data and reducing the file count to below 10k, I expected the concurrent job to now run faster. However, I was proved wrong.
Moment of Truth:
At last, I figured out it was the NFS mount options for applcsf that was the culprit. It was not set as per Oracle recommended values in Doc ID 384248.1
It was set as below:
<nfs server>:/applcsf /ebs_applcsf nfs rw,sync,hard,intr 0 0
I modified the /etc/fstab files on both apps nodes as:
<nfs server>:/applcsf /applcsf nfs rw,intr,bg,hard,timeo=600,wsize=32768,rsize=32768,nfsvers=3,tcp,nolock 0 0
Then as root, executed on both apps nodes:
umount /applcsf
mount <nfs server>:/applcsf /applcsf
After the above change, the EBS concurrent jobs took less than 2 mins to complete.