Friday, August 28, 2015

Essbase Crashes

Welcome to laid back Friday. I want to talk a little bit about Essbase crashes today. Since I started working with Essbase in 1998 I've experienced many Essbase agent and application crashes along with a fair share of agent hangs. I've spent countless days troubleshooting the problems, searching logs, trying to figure out which user was the last to perform an action in an application with the hope that I could call them and find out what they were doing just before the problem occurred. The burden of proof with support, whether it be Arbor, Hyperion or Oracle has always been on the customer. I'd have to figure out exactly what to do to replicate the problem -- only then could I feel confident that they'd call the issue a bug and address the problem. Better yet, if I could reproduce the problem in Sample.Basic, then I wouldn't have to send them my outline or data.

Flash forward a decade and I still see seemingly random crashes and hangs. I still search through the logs but mostly there's nothing I can do. Maybe play with some thread settings or increase a cache here or there. I feel a bit powerless. I'm sure others of you are in a similar boat. I don't have much advice to share other than, keep on doing your due diligence. Check the logs, talk to users when applicable and open tickets with Oracle. I will say that the crashes and hangs are few and far between compared to the Essbase 6 and prior days.

Troubleshooting Guide

This brings me to a recently released Oracle document that helps guide you when you do experience a crash. It's available HERE on the Oracle Suport Website which does require a login.

One section in the document is about the script (Oracle Remote Diagnostics Agent) which will zip up a bunch of your server information for easy sending to Oracle Support.

Process Monitor

In most of my Essbase server environments I've had a script that runs regularly throughout the day which will check if the Essbase server is responsive. If not, it will send me an email (or page in the old days). I'd recommend having some type of process monitor running as it's better to find out about an issue and start working on it before the users start calling.

No comments: