Very slow column count on large Postgres tabls

40 posts

[expired user #8358] posted 10 years ago in General

When using the "Data" tab to view table data (even when limited to 1k rows), HeidiSQL sends a "SELECT COUNT(*)FROM tbl" in order to show the row count at the top of the view.

Unfortunately, this operation gets slower with the size of the Postgres table. On a production system of 76.6M rows in a 10GiB table, it took ~30s for the result to come back and Heidi to be responsive again.

I'd advocate for using other queries to get an approximation of that count - see either of the two recommendations here:
http://stackoverflow.com/questions/2596670/how-do-you-find-the-row-count-for-all-your-tables-in-postgres

(The first query worked better in my instance, though it reported 75.6M.)

Also, I'm not sure how table sizes are obtained, but Heidi cannot show the size of my 10GiB table. This page shows queries that can be used to effectively retrieve that size quickly in Postgres:
https://wiki.postgresql.org/wiki/Disk_Usage

40 posts

[expired user #8358] posted 10 years ago

Clarification for accuracy: my table is actually 5.6GB in size, with a primary key adding 4.6GB to that. 76.6 million rows, as reported.

10361 posts

ansgar posted 10 years ago

The only query from that page which works on my PG 9.3 installation is this one:

SELECT reltuples
FROM pg_class AS c
LEFT JOIN pg_namespace AS n ON (n.oid = c.relnamespace)
WHERE
c.relkind='r' 
AND n.nspname='myschema'
AND c.relname='mytable'

A notice from the poster sounds like that only works on PG 8.3 +.

The following one returns 0 rows for all tables:

SELECT relname, n_tup_ins - n_tup_del as rowcount FROM pg_stat_all_tables;

And the third one mentioned there ends up in doing a SELECT COUNT(*) - which is what you want to avoid.

I need a stable query for all PG versions, so is there one? Or at least with only one version-conditional?

40 posts

[expired user #8358] posted 10 years ago

Hmm. Most of those queries worked for me on 9.3.

The Postgres wiki indicates this one will work back to at least version 8.4 (though version older than 9.0 they consider an "unsupported version).

SELECT reltuples FROM pg_class WHERE relname = 'tbl_name';

References:
https://wiki.postgresql.org/wiki/Count_estimate
http://www.postgresql.org/docs/9.4/interactive/catalog-pg-class.html

10361 posts

ansgar posted 10 years ago

That query doesn't incorporate the schema, so you get multiple rows if you have two tables with the same name.

But I think left joining pg_namespace ON (n.oid = c.relnamespace) should do it.

Code modification/commit d975065 from ansgarbecker, 10 years ago, revision 9.1.0.4920

Optimize query for getting total row count from PostgreSQL table. See http://www.heidisql.com/forum.php?t=17959

10361 posts

ansgar posted 10 years ago

Done in r4920 - please test.

40 posts

[expired user #8358] posted 10 years ago

Just tested that version - the data tab comes up instantly now!

Unfortunately, I think your string to number conversion doesn't take into account exponents - in my case, it shows "7 rows" while the query returned the number "7.66585e+07" (76,658,500)

10361 posts

ansgar posted 10 years ago

Hm, there is no conversion done, as the returned data type of "reltuples" column is a float, not some text/varchar. But it seems that PostgreSQL returns a float with an exponent on your server. Not so here:

SELECT 100.0*1000*1000

returns "100000000.0" and HeidiSQL detects it as float.

40 posts

[expired user #8358] posted 10 years ago

I actually don't know how PG behaves in these various scenarios.

Your sample query returned the full float, as expected (no exponent).

If I modify the query to be

SELECT reltuples::bigint FROM pg_class ...

I do get the full number, so that may be a sufficient fix.

40 posts

[expired user #8358] posted 10 years ago

(Tone of last post was mean to sound more inquisitive:

"I actually don't know how PG behaves in these various scenarios... Your sample query returned the full float, as expected (no exponent), so I'm not sure why the pg_class query doesn't."
)

I appreciate your ability to fix things quickly in the program!

10361 posts

ansgar posted 10 years ago

No problem. Thanks for the hint. I'll check that this evening.

Code modification/commit 0ecfca3 from ansgarbecker, 10 years ago, revision 9.1.0.4921

Cast reltuples column to bigint so it does not return a float with an exponent. See http://www.heidisql.com/forum.php?t=17959

10361 posts

ansgar posted 10 years ago

Done in r4921

40 posts

[expired user #8358] posted 10 years ago

Confirmed working on my table: correctly shows '42,872,124 rows total, limited to 1,000'

Unrelated, and I can start a new thread if you like: the table size estimate in the tree view does not show up for the large table. (It's just blank.) Not sure why that is.

10361 posts

ansgar posted 10 years ago

Watch out for the following automatic query in the SQL log:

SELECT *, pg_table_size(QUOTE_IDENT(t.TABLE_SCHEMA) || '.' || QUOTE_IDENT(t.TABLE_NAME)) AS data_length, pg_relation_size(QUOTE_IDENT(t.TABLE_SCHEMA) || '.' || QUOTE_IDENT(t.TABLE_NAME)) AS index_length, c.reltuples, obj_description(c.oid) AS comment
FROM "information_schema"."tables" AS t
LEFT JOIN "pg_namespace" n ON t.table_schema = n.nspname
LEFT JOIN "pg_class" c ON n.oid = c.relnamespace AND c.relname=t.table_name WHERE t."table_schema"='yourschema';

The size displayed in the tree is the sum of the columns data_length and index_length.

I guess it's just 0 for your large table? Or probably NULL? Probably we need to cast to bigint here, as int may be too small to hold the byte size?

40 posts

[expired user #8358] posted 10 years ago

The query does in fact return big integers, showing the table size in bytes. reltuples is formatted with an exponent again unless I specifically cast to a ::bigint. (No such cast is necessary for the pg_table_size() and pg_relation_size() funcitons).

From my DB:

data_length - 9673080832
index_length - 9670524928
reltuples - 7.66585e+07
reltuples::bigint - 76658512

Code modification/commit b529650 from ansgarbecker, 10 years ago, revision 9.1.0.4923

PostgreSQL: Cast result of pg_table_size() and pg_relation_size() to bigint, so size bars are displayed correctly. See http://www.heidisql.com/forum.php?t=17959

10361 posts

ansgar posted 10 years ago

Fixed in r4923

40 posts

[expired user #8358] posted 10 years ago

Sorry, I should have been clearer: I meant the query is alrelady returning numbers that are bigints, since they're well over 2B bytes.

So HeidiSQL still shows blanks for the large tables for me. Any chance the GUI code overflows (because it's expecting 32 bit ints only)?

Back to Postgres 8.4 (at least) there are also functions for pg_total_relation_size() (which sums the two numbers you're getting individually) and pg_size_pretty() (which may take care of the bigint -> nice string formatting).

Here's a sample:

SELECT t.table_catalog, t.table_schema, t.table_name, t.table_type,
pg_table_size(QUOTE_IDENT(t.TABLE_SCHEMA) || '.' || QUOTE_IDENT(t.TABLE_NAME)) AS data_length,
pg_relation_size(QUOTE_IDENT(t.TABLE_SCHEMA) || '.' || QUOTE_IDENT(t.TABLE_NAME)) AS index_length,
pg_total_relation_size(QUOTE_IDENT(t.TABLE_SCHEMA) || '.' || QUOTE_IDENT(t.TABLE_NAME)) AS total_length,
pg_size_pretty(pg_total_relation_size(QUOTE_IDENT(t.TABLE_SCHEMA) || '.' || QUOTE_IDENT(t.TABLE_NAME))) AS total_length_pretty,
c.reltuples::bigint, obj_description(c.oid) AS comment
FROM "information_schema"."tables" AS t
LEFT JOIN "pg_namespace" n ON t.table_schema = n.nspname
LEFT JOIN "pg_class" c ON (n.oid = c.relnamespace AND c.relname=t.table_name)
WHERE t."table_schema" = 'yourschema'
ORDER BY t.table_name

10361 posts

ansgar posted 10 years ago

Yes, you're right - the data_length + index_length were handled using a StrToIntDef() function, where I now changed it to StrToInt64Def(), which can handle 64bit integers correctly. Done in r4924.

40 posts

[expired user #8358] posted 10 years ago

I see the field correctly populated in the tree view now: shows "8.7 GiB". Thanks!

Please login to leave a reply, or register at first.