Segmentation fault ( after Shared Lock acquired, same table 4 times )
yesterday evening we ran into some trouble on one of our databases running PostgreSQL 9.0.4, AMD Opteron Setup, ECC Memory, SAN Storage.
This database has performed very well for month without any troubles with fsync on, full page writes on and streaming to our replica slaves.
Then yesterday PostgreSQL reported a segmentation fault "server process (PID 19531) was terminated by signal 11: Segmentation fault" and shutdown all processes and successfully reinitialized.
This happend immediatly after: process 19531 acquired ShareLock on transaction 3193803679 after 2277.230 ms",,,,,"SQL statement ""UPDATE ONLY xxxx.parent_table
The same error occured 1 minute later, same table, same "update only" statement.
Again all processes were shutdown and reinitialize successfull.
At this point we decided not to switch, because we considered PostgreSQL's decission to restart itself safe ( believing in conservative and safe decissions by developers and a good implementation of shutdown and recovery ) and considering we were already running for minutes while discussing our options.
Roughly 3 hours later, same error, same table, again segmentation fault. again twice within minutes.
This time we stopped manually, upgraded to 9.0.7 and restartet again. Still believing that our setup and the shutdown plus recovery prevents us from data corruption.
We renamed the parent table and its child, created identical new ones and moved the relevant data into the new tables.
Now we are running 8 hours without any problem.
I am sharing this, because I am looking for similar experiences or bad and good outcomes of similar errors. Especially since we ran into the same error on the same table 4 times, which is at least 3 time to many if not 4 times.
Thanks for your input!
[PHP on Windows]