GridSQL Home | Architecture | FAQ | Forums
GridSQL is a clustered database solution for data warehousing and data marts. It leverages open source databases running on PC-based commodity hardware (nodes), and intelligently coordinates work among them. All of the nodes are interconnected, typically via Gigabit Ethernet. Each GridSQL node has an assigned multi-threaded NodeAgent that executes and manages a pool of connections to the underlying database, and executes the commands sent from the designated coordinator node.

GridSQL loads data into the distributed data domain using an intelligent partitioning engine. Simply put, a hash value is used to determine the destination of a particular row. The column used in computing the hash value can also be chosen by the Database Administrator (DBA) as part of the overall schema strategy. In addition, other tables may be designated as replicated lookup tables, which results in an exact copy of the table being distributed to all nodes in the cluster. These are typically lookup tables or dimension tables.
GridSQL’s cost-based optimizer uses this partitioning and replication information to determine optimal strategies for processing queries. By choosing optimal query plans, it also minimizes network traffic and maximizes parallelism.
GridSQL is a separate layer and a separate process that runs on top of JDBC compliant databases, targeted for EnterpriseDB Postgres Plus and PostgreSQL. It has its own Parser, Optimizer, Planner, and Metadata Manager. GridSQL uses either EnterpriseDB Advanced Server or PostgreSQL 8.3 as underlying databases, but may also be configured to use other databases.
The primary components of GridSQL’s distributed, agent-based architecture are described in more detail below.
The GridSQL Coordinator is the primary processing engine of GridSQL, performing several key functions:
GridSQL metadata is created by passing DDL to the Coordinator, which then adds appropriate schema information to the metadata database and creates the table structures on the target nodes. Data can be loaded into the node databases via GridSQL or via external applications and loaders.

Client applications interact with the Coordinator using the standard Postgres API, typically via available drivers such as JDBC and ODBC. Each incoming request is taken from the Request Queue, assigned a thread and passed to the Parser. The Parser, Optimizer and Planner collectively manage an object called ExecutableRequest (not shown in the diagrams) that contains the original SQL request as well as the GridSQL implementation of the request that has been optimized based on the partitioning and replication profile defined in the metadata. The Coordinator queues optimized request plans, schedules their execution and returns result set payloads to client applications.
Although NodeAgents are deployed on remote nodes in typical GridSQL configurations, it is noteworthy that they can also be deployed directly within the Coordinator node. This configuration provides a convenient option for users who are getting started with GridSQL and are not ready to take on the effort of configuring a fully-distributed infrastructure.
The MultinodeExecutor is GridSQL’s runtime workhorse, coordinating the execution of complex, multi-part requests across all nodes in the cluster. Among other responsibilities, the MultinodeExecutor:
1. Receives execution objects from the Coordinator
2. Assigns a CoordinatorAgent to aid with message handling
3. Connects to NodeAgents and coordinates their work
4. Manages all housekeeping activities when request executions complete and when they are terminated

Based on the contents of the execution object, the MultinodeExecutor creates a series of node messages that tell the NodeAgents what work to do and how to do it. In the case of a SELECT statement, for example, the MultinodeExecutor receives an ExecutionPlan containing a series of ExecutionStep objects, creates the appropriate NodeMessages, and iterates through the execution of those messages with the associated NodeAgents until execution is completed.
As its name implies, the MultinodeExecutor often coordinates the activities of several NodeAgents during the lifecycle of an ExecutionPlan. The MultinodeExecutor, therefore, also participates in preparing individual result sets for consolidated result set usage, which are dynamically merges via the CombineResultSet object (not shown in diagrams) and ultimately returned to client applications.
Depending on connectivity options and the transactional state of the client application, the same MultinodeExecutor may be reused during a client session.
The NodeAgent is responsible for managing all node-related processing, including handling multiple concurrent requests, and works in concert with the MultinodeExecutor’s CoordinatorAgent (see above). All messages passed to requests being processed within NodeThreads must first pass through the NodeAgent’s inbound message queue. In the case of a new request, the NodeAgent assembles a corresponding NodeThread session context by taking a NodeThread (which includes a paired NodeProducerThread) from the NodeThread pool, and a JDBC connection from the connection pool. Simple requests are executed immediately while minimizing messaging, with NodeThread or NodeProducerThread passing their outbound messages through the NodeAgent for delivery to the MultinodeExecutor’s inbound queue.

Large requests are processed on the Coordinator using an ExecutionPlan. As described above, the MultinodeExecutor’s CoordinatorAgent passes a sequence of ExecutionSteps to the NodeAgent, which in turn passes back corresponding acknowledgements and results. When a request has been fulfilled, the NodeAgent releases its resources back to its pools and notifies the MultinodeExecutor when this has been completed. Note that allocated NodeThreads may persist, depending on the client state.
The NodeThread object is the nuclear work unit executing on each node in the cluster, and consists of three essential elements: NodeThread itself, NodeProducerThread and Connection. NodeThread and NodeProducerThread use queues to receive data access instructions and return data payloads. Importantly, all messages destined for NodeProducerThread’s queue are passed through NodeThread, allowing NodeThread to assign database queries to NodeProducerThread while asynchronously processing other instructions (database inserts, cancellation requests, etc.) directly. Importantly, since NodeThread and NodeProducerThread share the same Connection object, these actions can happen within the same corresponding backend session context.

Using the ProducerSender object (not shown in diagrams), NodeProducerThread collects and transmits query results to the NodeAgent. When all results have been processed, NodeThread notifies the NodeAgent that it has completed its work, and it is directed to execute its reset method to perform all necessary clean-up activities.
Download Postgres Plus, which includes GridSQL and the Postgres Plus database server, or download GridSQL and the source code separately.