

Pros: Lot of datasources, good SQL support, good documentation, monitoring dashboardĬons: Setup (server + client), full SQL re-implementation thus you’ll lose Postgres analytic functions etc UnityJDBC + SQuirrelL SQL ClientĪnother “generic” solution for connecting various databases, including Postgres, via standard SQL can be performed by using the Unity “virtual datasource” plugin for the popular SQL client called SQuirreL. Presto:default> SELECT count(*) FROM 1 x INNER JOIN 1 y ON x.c1 = y.c1 Launch the query client with “./presto –server localhost:8080 –catalog hive –schema default”.Start the server with “bin/launcher start”.Create a couple of simple config files as shown in the deployment manual.Setup might look scary at first, but it will actually only take minutes to get going as the docs are great. For bigger amounts of data it assumes nodes with a lot of RAM! Basically it is a Java-based query parser/coordinator/worker framework, so not the most light-weight approach, but definitely worth a try even for smaller amounts of data. Created by Facebook to juggle Terabytes of data for analytical workloads, one can expect it to handle your data amounts efficiently though.

Presto is an open source distributed SQL query engine, meant to connect the most different “bigdata” datasources via SQL, thus not really Postgres-centric but DB-agnostic. Pros: schema introspection, performance, allows data modifications, full transaction supportĬons: quite some steps needed for setup + user management Presto Define foreign tables by specifying columns yourself or importing whole tables/schemas automatically (9.5+).Create an user mapping (so that different users could do be allowed to perform different operations on the remote tables).NB! The FDW also supports writing/changing data and transactions! Full documentation here. True, in older Postgres versions the plans were not always too optimal but recent 9.6 version got a lot of attention in that area. Basically what you get here is a permanent “symlink / synonym” to a table/view on another database, with the benefit that the local Postgres database (where the user is connected) already has the column details on the table – most importantly size and data distribution statistics, so that it can figure out better execution plans. On board since 9.3, the Postgres foreign-data wrapper (postgres_fdw extension, available in “contrib”) is an improvement over dblink and is well suitable for more permanent data crunching and one could even build complex sharding/scaling architectures on top of it with the introduction of “foreign table inheritance” in 9.6. Pros: easiest setup possible, flexibility on connecting to X amount of Postgres DBsĬons: SQLs could get ugly for multiple joins, possible performance issues for bigger datasets, basic transaction support The Postgres foreign-data wrapper One can also declare the connection directly in the dblink function itself, but then your SQL-s might get a bit unwieldy for larger number of involved databases.Increasing work_mem/temp_buffers might alleviate the IO penalty when working with bigger amounts of data pulled in from dblink.Remote data is pulled onto the server without any extra information (statistics, indexes) so if the data amounts are bigger and there are many operations on higher nodes, most probably things will be non-optimal performance wise.The query will be sent over to the specified connection, and the pulled in dataset will be handled as a normal subselect – thus from thereon one could use all the functionality that Postgres has to offer! Full documentation on the extension here. Basically you just need to create the extension (requires “contrib”), declare a named connection and then use the dblink function to specify a query, including a list of output columns and their datatypes. UnityJDBC virtual driver + SQuirrelL SQL clientĪround since ever, this method might easily be the simplest way to join independent Postgres databases.So let’s look at the following 4 options: Luckily PostgreSQL (plus the ecosystem) provide some options out of the box and there are also some 3rd party tools for cases when you for example can’t use the Postgres options (no superuser rights or extensions can be installed). So how do you solve such ad-hoc tasks? One could of course solve it on application level with some simple scripting, but let’s say we only know SQL. Think sales reporting aggregations over logical clusters or matching click-stream info with sales orders based on customer ID-s. no built in clustering extensions or such are in use) to present it as one logical entity. With the heyday of bigdata and people running lots of Postgres databases, sometimes one needs to join or search data from multiple absolutely regular and independent PostgreSQL databases (i.e.
