Lately we upgraded our Java EE applications to new platform and began seeing stuck threads and slow starting times. The platform was upgraded from OC4J to WebLogic 12c and also the underlying LDAP service was changed to Oracle Access Manager. Looking at the server logs the one possible reason for stuck threads was quite clear: LDAP requests.
Fortunately the stuck threads problem with LDAP was a known problem with Oracle Weblogic Server 10.3.2 and later and covered in Oracle Support doc 1436044.1. The LDAP provider fails to authenticate for some users and the server logs show Stuck Threads in LDAP requests:
<10.9.2014 11.43.46 EEST> <[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "607" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 607304 ms
", which is more than the configured time (StuckThreadMaxTime) of "600" seconds in "server-failure-trigger". Stack trace:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:503)
netscape.ldap.LDAPMessageQueue.waitForMessage(Unknown Source)
netscape.ldap.LDAPMessageQueue.waitFirstMessage(Unknown Source)
netscape.ldap.LDAPConnection.sendRequest(Unknown Source)
netscape.ldap.LDAPConnection.search(Unknown Source)
weblogic.security.providers.authentication.LDAPAtnDelegate.getDNForUser(LDAPAtnDelegate.java:3771)
weblogic.security.providers.authentication.LDAPAtnDelegate.userExists(LDAPAtnDelegate.java:2384)
weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl.login(LDAPAtnLoginModuleImpl.java:199)
com.bea.common.security.internal.service.LoginModuleWrapper$1.run(LoginModuleWrapper.java:110)
java.security.AccessController.doPrivileged(Native Method)
com.bea.common.security.internal.service.LoginModuleWrapper.login(LoginModuleWrapper.java:106)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
javax.security.auth.login.LoginContext.login(LoginContext.java:595)
...
The cause for this is that authentication requests are hanging whenever the LDAP server is slow. By default, connections and searches to the LDAP server do not time out, so if the LDAP server is slow, authentication requests may take a very long time to retry. This can be seen as many threads stuck doing LDAP searches.
The solution is to set a timeout on LDAP requests for example as below (described in Oracle Support doc 1436044.1):
- Log in to the WLS Administration Console.
- Navigate to Security Realms -> myrealm -> Providers -> "your_ldap_authenticator."
- Select the following values:
- Connect Timeout 30
- Results Time Limit 5000
- Uncheck "Keep Alive Enabled"
- Save and apply changes. Restart the required servers if prompted.
NOTE: The optimal values may differ from environment to environment. But we can try the values specified here as starting places, and they will help in most cases like this. Our original values for the LDAP authenticator settings were: Connect Timeout: 0; Results Time Limit: 0; Keep Alive Enabled unchecked.
This is still a partial solution as you should investigate why the LDAP is slow. For now this solves our problem but has some side effects with user authentication.
Leave a Reply