From: Tom Lane <tgl@xxxxxxxxxxxxx>
To: Mark Dilger <markdilger@xxxxxxxxx>
Cc: deepak <deepak.pn@xxxxxxxxx>; Alban Hertroys <haramrae@xxxxxxxxx>; "pgsql-general@xxxxxxxxxxxxxx" <pgsql-general@xxxxxxxxxxxxxx>
Sent: Wednesday, May 23, 2012 11:17 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists
Mark Dilger <markdilger@xxxxxxxxx> writes:
> Prior to posting to the mailing list, we made some
> changes in postmaster.c to identify where time was
> being spent. Based on the elog(NOTICE,...) lines
> we put in the file, we determined the time was spent
> inside RemovePgTempFiles.
> I then altered RemovePgTempFiles to take a starttime
> parameter and, while recursing, to check if more than
> 5 seconds has passed since it started. I did not want
> to add the complexity of setting an alarm and catching
> the signal, so I just made the code check the wallclock
> time at each step of the recursion. When more than
> 5 seconds has passed, it does not recurse further.
> After making this change, we have not been able to
> reproduce the slowness.
OK, so we're back to the original question: how could this possibly be
taking that long? Have you got thousands of tablespaces (and if so why)?
Does your system have a habit of crashing at times when there are
thousands of temp files? Maybe you're using IP over avian carriers to
access your SAN? It just doesn't make any sense given the information
regards, tom lane