Today I noticed something very strange suddenly on of the servers API response time increased from 100 milliseconds to 30 seconds.
On deeper analysis it looked like Operating System was blocking pseudo random number generation.
The issue would go away on VM restart, but within a few minutes the response time would degrade again.
After some analysis and reading, OS blocking random number generation was the cause of this issue.
Java cryptography classes requires random numbers and this has been discussed in detail here.
Snippet from above link:
Java applications can and should use java.security.SecureRandom class to produce cryptographically strong random values by using a cryptographically strong pseudo-random number generator (CSPRNG). The standard JDK implementations of java.util.Random class are not considered cryptographically strong.
Unix-like operating systems have /dev/random, a special file which serves pseudo random numbers accessing environmental noise collected from device drivers and other sources. However, it blocks if there is less entropy available than requested; /dev/urandom typically never blocks, even if the pseudorandom number generator seed was not fully initialized with entropy since boot. There still is a 3rd special file, /dev/arandom which blocks after boot until the seed has been securely initialized with enough entropy, and then never blocks again.
By default, the JVM seeds the SecureRandom class using /dev/random, therefore your Java code can block unexpectedly. The option -Djava.security.egd=file:/dev/./urandom in the command line invocation used to start the Java process tells the JVM to use /dev/urandom instead.
The extra /./ seems to make the JVM to use the SHA1PRNG algorithm which uses SHA-1 as the foundation of the PRNG (Pseudo Random Number Generator). It is stronger than the NativePRNG algorithm used when /dev/urandom is specified.
So the solution was to pass this as system parameter to jvm