Re: Repeated execution of identical subqueries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Craig Ringer wrote:
Tom Lane wrote:
No, not at the moment.  In principle the planner could look for such
duplicates, but it'd be wasted cycles so much of the time that I'd be
loath to do it.
Good point - there are better places to spend time, and I imagine it'd be an expensive thing to check too.

The one very simple case that gets me every time is when a user-defined function is called.

test=> explain analyze select chm_mf(isosmiles) from version where chm_mf(isosmiles) like '%C20%' or chm_mf(isosmiles) like '%C21%';
QUERY PLAN ---------------------------------------------------------------------------------------------------------
Seq Scan on version  (cost=0.00..18.57 rows=29 width=43) (actual time=48.798..1180.538 rows=50 loops=1)
  Filter: ((chm_mf(isosmiles) ~~ '%C20%'::text) OR (chm_mf(isosmiles) ~~ '%C21%'::text))
Total runtime: 1180.683 ms
(3 rows)

This table only has 375 rows TOTAL, yet it takes over a second to answer this query: "Find me molecules with either 20 or 21 carbon atoms in the molecular formula".  This is a somewhat contrived example, but we have others that really do get us, and we go to great lengths to avoid them.  It requires us to avoid "natural" queries like the one above, and instead create convoluted application logic using temporary tables to hold the results of a function call, which we can then use in a query that uses the values more than once.  Something like this:

create temporary table foo(mf text);
explain analyze insert into foo (select chm_mf(isosmiles) from version);
QUERY PLAN ---------------------------------------------------------------------------------------------------------
Seq Scan on version  (cost=0.00..15.69 rows=375 width=43) (actual time=1.829..566.233 rows=375 loops=1)
Total runtime: 568.470 ms

Now we can use this table in a complex query where we need the MF twice, and discard it, and it's STILL faster than the original "natural" SQL.  Notice that it takes just half the time as the first query, which tells me chm_mf() was being called twice in the first example.  This function is defined as:

 CREATE OR REPLACE FUNCTION chm_mf(text) RETURNS text
 AS '/usr/local/pgsql/lib/libchm.so', 'chm_mf'
 LANGUAGE 'C' STRICT IMMUTABLE;

I can understand how in the general case, it is very hard to identify repeated subqueries.  But it seems like an IMMUTABLE function shouldn't be called twice on the same column -- isn't that the whole point of IMMUTABLE?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux