Using GIS Cloud With External Databases

Author: Jonathan Stanger, Phd
Spatial Partners, Australia – GIS Cloud partner

Using GIS Cloud out-of-the-box provides most users with a cloud mapping application that fits all their business needs. While all platforms have limitations to their functionality, GIS Cloud is somewhat unique due to its willingness to offer integration opportunities via REST API, JavaScript library, and external databases.

The cloud-hosted database that is included with every Map Editor license is very convenient so that your data is kept secure, but access to the actual database server is not available. The limitations imposed by a lack of that access can be circumvented by configuring GIS Cloud to retrieve data directly from a server that you host and manage, such as AWS RDS for PostGIS.

Some reasons why you would go through all this trouble are:

you don’t want to migrate or mirror an existing spatial database of data to the GIS Cloud database
your organization uses an existing spatial database for other applications such as QGIS or Carto (perhaps consider the QGIS plugin or the GIS Cloud Map Portal)
you need to have direct control over where the data is hosted, such as for compliance reasons
you will use direct database access to facilitate integration or reporting
your spatial data has a data model that is too complex to work with GIS Cloud
you want to use Triggers/Views/Materialized Views to interact with your data

If you are considering working with an external database, this article can provide you with a few hints on making the process easier.

Location

Your first consideration is where your external database will be hosted. Every time two devices need to communicate there is a short delay known as latency. The further away each device is, the more likely you will experience slower performance.

Global GIS Cloud servers are hosted on AWS in the “us-east-1” availability region. Therefore, if you can host your database there, it will have the best chance for low latency because the two servers are “close” to each other.

This diagram shows the impact that path distance can have on a user.

If your existing database is stored on-premise, then you may wish to consider hosting a mirror on AWS to shorten the path that GIS Cloud needs to take to retrieve the data while maintaining your “primary” database on-premise. Rather than this, it may be better to mirror your data directly to GIS Cloud’s database for optimal performance.

If you need to host your data elsewhere for compliance reasons, check with GIS Cloud as there are some locally deployed instances in other countries.

We were able to improve the MDC access speed by a factor of 10 by hosting the data in the same AWS availability zone as GIS Cloud servers.

Table Schema

When GIS Cloud accesses tables via an external connection, it will try to treat the tables the same as tables in the GIS Cloud database. Because there is no way to know the details about how your table is designed in GIS Cloud, it is better to copy the table structure that GIS Cloud uses so that you do not experience any unusual behavior.

The first consideration is recognizing that geometry columns should be created as one of the primary types (point, linestring, polygon, or the multi versions of these). Similar to most GIS applications, GIS Cloud layers are limited to one single geometry type per layer.

Following from this would be setting the name of your geometry column. If your geometry column is using, say the PostGIS default of “geom”, the column will show up in your layer definition as if it were an attribute. If you use “__geometry” as the name, this will be hidden automatically by GIS Cloud as it recognizes it as the geometry.

Some additional columns to include in your table schema for GIS Cloud are “__created”, “__modified” and “__owner” which should all be of a large integer type. Similarly, if your Primary Key column is named “id” or “__id” this will hide it from the user which can be desirable.

Next is choosing data types. The GIS Cloud REST API and forms are quite permissive with respect to what kind of data they accept. What I mean by this is that if you pass a number to GIS Cloud and the destination is a string, it will coerce the value to match, minimizing errors. However, when configuring your table schema you may be tempted to be very strict with your column types. This can cause unexpected conflicts as the GIS Cloud API has done its best to work with your data, but the database has rejected the value entered into a Mobile Data Collection form.

A classic example of this might be a datetime which can take many different formats but GIS Cloud will happily work with dates as strings eliminating the need to debug mystery errors.

Related to data types is the question of adding a “NOT NULL” constraint to your columns. This is often sensible but can result in unexpected failures if your GIS Cloud form attached to the layer doesn’t align with your database constraints. To minimize unexpected issues, it is better to impose required fields at the Form level rather than at the Database level.

Finally, the most critical table requirement if you intend to collect new features into your external table. GIS Cloud is not able to easily determine the Primary Key requirements from your external table and therefore adopts a hands-off approach. Therefore, your Primary Key column should be defined as auto-incrementing/generating as appropriate to your SQL dialect.

Triggers/Views/Materialized Views, oh my!

GIS Cloud supports the use of Views to access spatial tables which can be a great way to “flatten” a more complex data model using a set of SELECT JOIN ON statements. This means you might have one spatial table that has a Foreign Key to join to another SQL table, that itself might have a Foreign Key to join to another table; with a view you can create one “table” that has columns from all three tables to display in GIS Cloud.

PostgreSQL, which is the basis of PostGIS, has a somewhat unique feature of Materialized Views. These are just like a View but the result of the SQL query is cached within the database and must be manually triggered to update. The advantage of this approach is that you can define spatial and non-spatial indexes on the View results which can speed up searching/retrieval.

It also offers a way to separate a “working” copy of your GIS data and a “published” copy. However, GIS Cloud doesn’t support these tables yet, so to get around this you need to create a View that selects from the Materialized View.

Lastly, a classic feature of databases is Triggers. If you have your own database then you have no restriction in applying triggers. But do be careful to consider potential performance impacts as GIS Cloud still needs to read and write to your database and will not be aware of these triggers. Anything that delays responding to a SELECT or UPDATE statement may negatively impact application performance.

Author Bio

Dr. Jon Stanger

Head Solution Architect – Spatial Partners

Dr. Jon Stanger started his career in Nanotechnology, holding a Masters in Physics and a PhD in Materials Engineering. However, the academic industry did not quite suit him, and for the last 6 years he has been working in the GIS industry with Spatial Partners. By nature, he is very multidisciplinary which makes him a bit of a swiss army knife for data driven challenges, mainly working with clients in the telecommunications, utility and construction industries. In his role as Head Solution Architect, he has carried this passion for listening, learning and applying knowledge into the culture of Spatial Partners which has allowed the creation of unique, cost effective, functional and at time unexpected solutions to be delivered by the company.

Our partners from Australia – Spatial Partners specialize in spatial technology and data automation. They define and design solutions to complex data challenges, and build their own solutions on top of the GIS Cloud platform.

Their Head Solution Architect, Dr. Jonathan Stanger showcased one of these solutions in a presentation held on May 12th at the Safe Software FME World Fair 2021 in a session “Where Your Data Dreams Come True”. He talked about “Magic Tricks for taming wild external systems using SSH Tunnels and high-volume HTTP calls.”

FME is the gold standard for GIS-based ETL, integration, and automation but it encounters a significant issue when working with cloud-based GIS APIs such as what is available for GIS Cloud. With FME out-of-the-box and GIS Clouds API, you may face upwards of 5 minutes to update only 1000 features.

Watch this presentation and learn how to bring the same operation down to 30 seconds, making automation possible, via FME and GIS Cloud integration!