10 - 2
[1] 8
Calculating the area of Queensland
This material is modified for EVB 203 Geospatial Information Science course at the Queensland University of Technology from two main sources:
There are also many references to the excellent Geocomputation in R for further geospatial reading.
This short tutorial will start with a general introduction to R and geospatial analysis. We will get familiar with R and RStudio, functions, and packages before using different models of the Earth to calculate the area of Queensland.
The R programming language is a language used for calculations, statistics, visualisations of geospatial and other data.
RStudio is an open source Integrated Development Environment (IDE), which means it provides many features on top of to make it easier to write and run code.
R’s main strong points are:
The learning curve will be steeper than point-and-click tools, but as far as programming languages go,r is more user-friendly than others.
Managing and organizing your code, data, figures, etc. for any project is important. In RStudio, we can do that by creating a new **project**:
~
is your home directory). For example, a folder called “Week1” in “EVB203”: ~/EVB203/Week1.Projects make your work withr more straight forward, as they allow you to segregate your different projects in separate folders. You can create a .Rproj file in a new directory or an existing directory that already has code and data. Everything then happens by default in this directory. The .Rproj file stores information about your project options, and allows you to go straight back to your work.
The console (usually at the bottom left in RStudio) is where most of the action happens. In the console, we can user interactively. We write a command and then execute it by pressing Enter.
In its most basic use, R can be a calculator. Try executing the following commands:
10 - 2
[1] 8
3 * 4
[1] 12
2 + 10 / 5
[1] 4
11^6
[1] 1771561
Those symbols are called “binary operators”: we can use them to multiply, divide, add, subtract and exponentiate. Once we execute the command (the “input”), we can see the result in the console (the “output”).
What if we want to keep reusing the same value? We can store data by creating objects, and assigning values to them with the assignment operator <-
:
<- 42
num1 <- num1 / 9
num2 num2
[1] 4.666667
We can also store text data:
<- "Hello World!"
sentence sentence
[1] "Hello World!"
You should now see your objects listed in you environment pane (top right).
As you can see, you can store different kinds of data as objects. If you want to store text data (a “string of characters”), you have to use quotes around them.
You can use the shortcut Alt+- to type the assignment operator quicker.
An R function is a little program that does a particular job. It usually looks like this:
<functionname>(<argument(s)>)
Arguments tell the function what to do. Some functions don’t need arguments, others need one or several, but they always need the parentheses after their name.
For example, try running the following command:
round(num2)
[1] 5
## [1] 5
The round()
function rounds a number to the closest integer. The only argument we give it is num2
, the number we want to round.
If you scroll back to the top of your console, you will now be able to spot functions in the text.
What if we want to learn more about a function?
There are two main ways to find help about a specific function in RStudio:
?functionname
Let’s look through the documentation for the round()
function:
?round
As you can see, different functions might share the same documentation page.
There is quite a lot of information in a function’s documentation, but the most important bits are:
See how the round()
function has a second argument available? Try this now:
round(num2, digits = 2)
[1] 4.67
We can change the default behaviour of the function by telling it how many digits we want after the decimal point, using the argument digits
. And if we use the arguments in order, we don’t need to name them:
round(num2, 2)
[1] 4.67
To keep it tidy, we are creating 3 folders in our project directory:
For that, we use the function dir.create()
:
dir.create("scripts")
dir.create("data")
dir.create("plots")
You can recall your recent commands with the up arrow, which is especially useful to correct typos or slightly modify a long command.
Scripts are simple text files that contain R code. They are useful for:
Let’s create a new R script with a command:
file.create("scripts/process.R")
All the file paths are relative to our current working directory, i.e. the project directory. To use an absolute file path, we can start with
/
.
To edit the new script, use the file.edit()
function. Try using the Tab key to auto-complete your function name and your file path!
file.edit("scripts/process.R")
This opens our fourth panel in RStudio: the source panel.
Packages add functionalities tor and RStudio. There are more than 19,000 available.
You can see the list of installed packages in your “Packages” tab, or by using the library()
function without any argument.
We are going to install and load a new package called “praise”. We can do that outside of the console, the area outside of the console is know as the Graphical User Interface (GUI). * click the “Install” button in the “Packages” tab (bottom right pane), and search for “praise”.
Notice how it runs an install.packages()
command in the console? You can use that too.
If I try running the command praise()
, I get an error. That’s because, even though the package is installed, I need to load it every time I start a new R session. The library()
function can do that.
library(praise) # load the package
praise() # use a function from the package
## [1] "You are bee's knees!"
Even though you might need the motivation provided by this function, other packages are more useful for your work.
Let’s go ahead and install the packages we will need for the rest of this tutorial:
install.packages(c('ozmaps', 'dplyr', 'sf', 'tmap'))
Let’s make our first map in R! Go ahead and load the {ozmaps} package.
library(ozmaps)
This package includes several Australian data sets but we will focus on Queensland. Let’s start with the ozmap()
function.
ozmap()
We have our first map 😎.
We’re going to focus on Queensland though - how do we extract Queensland? Let’s use ozmap_data()
to extract the state data.
ozmap_data("states")
Simple feature collection with 9 features and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 105.5507 ymin: -43.63203 xmax: 167.9969 ymax: -9.229287
Geodetic CRS: GDA94
# A tibble: 9 × 2
NAME geometry
* <chr> <MULTIPOLYGON [°]>
1 New South Wales (((150.7016 -35.12286, 150.6611 -35.11782, 150.6…
2 Victoria (((146.6196 -38.70196, 146.6721 -38.70259, 146.6…
3 Queensland (((148.8473 -20.3457, 148.8722 -20.37575, 148.85…
4 South Australia (((137.3481 -34.48242, 137.3749 -34.46885, 137.3…
5 Western Australia (((126.3868 -14.01168, 126.3625 -13.98264, 126.3…
6 Tasmania (((147.8397 -40.29844, 147.8902 -40.30258, 147.8…
7 Northern Territory (((136.3669 -13.84237, 136.3339 -13.83922, 136.3…
8 Australian Capital Territory (((149.2317 -35.222, 149.2346 -35.24047, 149.271…
9 Other Territories (((167.9333 -29.05421, 167.9188 -29.0344, 167.93…
We can see that instead of returning a plot like ozmap()
in the Plots pane, it returns a data table in the Console. There are 2 columns: NAME and geometry. The data type is listed under the column name. We can see the geometry column is a “MULTIPOLYGON”. This column contains the points to draw the polygons we saw in the plot.
We can save this data table as an object called “states” using the assignment operator <-
.
<- ozmap_data("states")
state # Look at the data table state
Simple feature collection with 9 features and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 105.5507 ymin: -43.63203 xmax: 167.9969 ymax: -9.229287
Geodetic CRS: GDA94
# A tibble: 9 × 2
NAME geometry
* <chr> <MULTIPOLYGON [°]>
1 New South Wales (((150.7016 -35.12286, 150.6611 -35.11782, 150.6…
2 Victoria (((146.6196 -38.70196, 146.6721 -38.70259, 146.6…
3 Queensland (((148.8473 -20.3457, 148.8722 -20.37575, 148.85…
4 South Australia (((137.3481 -34.48242, 137.3749 -34.46885, 137.3…
5 Western Australia (((126.3868 -14.01168, 126.3625 -13.98264, 126.3…
6 Tasmania (((147.8397 -40.29844, 147.8902 -40.30258, 147.8…
7 Northern Territory (((136.3669 -13.84237, 136.3339 -13.83922, 136.3…
8 Australian Capital Territory (((149.2317 -35.222, 149.2346 -35.24047, 149.271…
9 Other Territories (((167.9333 -29.05421, 167.9188 -29.0344, 167.93…
To visualize this data table like when we use ozmap()
, all we need to do is plot the object with the plot()
function.
windows(width = 12, height = 10)
plot(state)
This gives us a similar plot to ozmap()
but with the states colored.
To work with only Queensland, we can save the Queensland data as its own object using the filter()
function from another package, {dplyr}. This is a well-known data manipulation package. filter()
allows us to filter the data frame by row. Let’s look at the documentation of filter.
Put your cursor in the filter() function and press the F1 key as a shortcut to the help documentation.
The first argument in the filter()
function we need to define is the .data
argument. We are going to filter on the state
data frame object we just created. The ...
is an “expression that returns a logical value”. We need to define what we are filtering in which column. In this case, we want to keep the observations where the NAME
column is Queensland
.
We set logical expressions by using the double equal sign ==
. We can try this by asking if 1 is equal to 2. What is the output in the console?
What is the difference between = and ==? One equal sign is used to set arguments and two equals is a logical statement e.g., is X == to Y?
1 == 2
[1] FALSE
For our case, we’re interested in the observations where the NAME
column of our state
object is equal to Queensland
.
How do we define a new object qld
using filter()
using ==
?
library(dplyr) # load the package first
<- filter(.data = state, NAME == "Queensland")
qld # see the data frame qld
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 137.9943 ymin: -29.17719 xmax: 153.5522 ymax: -9.229287
Geodetic CRS: GDA94
# A tibble: 1 × 2
NAME geometry
* <chr> <MULTIPOLYGON [°]>
1 Queensland (((148.8473 -20.3457, 148.8722 -20.37575, 148.8515 -20.38094, 148.…
If we return qld
, the object only has one row or observation. We can also plot Queensland like we did before with state
.
What type of object is qld
? We can check this with the class()
function.
class(qld)
[1] "sf" "tbl_df" "tbl" "data.frame"
From the output, we can see that qld
is both a data frame AND an sf, simple features, object.
Let’s plot qld. The plot()
function is a baser function meaning it is part of the “default”r language an no packages need to be loaded to use it.
There are many other mapping packages. Load {tmap} and compare it’s qtm()
quick thematic mapper function with base plot()
. Which one do you like better?
plot(qld)
library(tmap) # load the package first
qtm(qld)
As we learned in lecture, there are different ways we can represent the Earth. For calculating area of a polygon like Queensland, the representation we are using matters.
Let’s check the coordinate reference system (CRS) of our qld
object. First we need to load the package {sf} which stands for “simple features”.
From Geocomputation in R by Robin Lovelace:
Simple features is an open standard developed and endorsed by the Open Geospatial Consortium (OGC), a not-for-profit organization… …Simple Features is a hierarchical data model that represents a wide range of geometry types. Of 18 geometry types supported by the specification, only 7 are used in the vast majority of geographic research (see Figure 2.2); these core geometry types are fully supported by the R packages sf [@pebesma2018]
Then we can use the st_crs()
function to check what CRS our qld
object is.
library(sf)
st_crs(qld)
Coordinate Reference System:
User input: EPSG:4283
wkt:
GEOGCRS["GDA94",
DATUM["Geocentric Datum of Australia 1994",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["Australia including Lord Howe Island, Macquarie Islands, Ashmore and Cartier Islands, Christmas Island, Cocos (Keeling) Islands, Norfolk Island. All onshore and offshore."],
BBOX[-60.56,93.41,-8.47,173.35]],
ID["EPSG",4283]]
This output is known as “well-known text” (WKT) and is the standard, human-readable, encoding for simple feature geometries. This includes important information like the CRS.
What is the CRS of our qld
object?
Our qld
object is in a geographic coordinate reference system (GCS). How do we know that?
GCS’s represent the Earth’s surface in 3-dimensions and have angular, lat-long units (Figure 1). They are defined by a datum (reference points associated with an ellipsoid (a model of the size and shape of the Earth)), a prime meridian, and an angular unit of measure. See here for more detail. Calculating area in a 3-D GCS requires geodesic (i.e., ellipsoidal) computations, which can be complex and computationally expensive.
.
From the output of st_crs()
, we can can see that the CRS is using the GDA94 datum and further down we can see references to ANGLEUNIT in degrees.
Since 2021, {sf} (versions >1) uses Google’s S2 spherical geometry engine by default when representing vector polygons that are in a 3-D lat-long GCS.
Let’s check that S2 is on and calculate the area of Queensland using st_area()
from the {sf} package. We can also set the units to square kilometers using set_units()
from the {units} package.
We can use the :: to use functions from a certain package without loading them.
What is the area of Queensland in square kilometers?
library(sf)
sf_use_s2()
[1] TRUE
st_area(qld)
1.733842e+12 [m^2]
st_area(qld) %>% units::set_units('km2')
1733842 [km^2]
Let’s save this value as an object called area_S2
.
<- st_area(qld) %>% units::set_units('km2')
area_S2 area_S2
1733842 [km^2]
While the S2 geometry library offers many advantages, when it comes to area there is a speed vs. accuracy trade-off. When S2 is on, {sf} calculates area assuming a spherical approximation of the Earth. But the Earth isn’t a perfect sphere, it’s a spheroid. It bulges at the equator.
When S2 is off, {sf} performs geodesic (i.e., ellipsoidal) area calculations using the high precision GeographicLib that uses a spheroid to represent the Earth’s size and shape. Geodesic calculations will be more accurate than those performed with S2 on, as S2 assumes a spherical representation of the Earth. But geodesic computations more complex, and time costly.
Let’s examine the speed vs. accuracy trade-off. We’ll start by loading some vector polygons representing countries of the world and check if the geometries are valid.
First we need to turn S2 off for {sf}. Now, when were calculate the area (saved as area_GeoLib
) it will use a spheroid model for the calculations.
sf_use_s2(FALSE) # turn off S2
<- st_area(qld) %>% units::set_units('km2') # S2 off, sf calculates geodesic area using GeographicLib
area_GeoLib area_GeoLib
1729552 [km^2]
How do the S2 and GeographicLib areas of Queensland compare? We can subtract them to get the difference.
- area_GeoLib # subtract to get the difference in area estimates area_S2
4289.842 [km^2]
There is an ~4000 km2 difference in area.
Check these two areas with the area of Queensland according to Geoscience Australia. Which one is more accurate? Is this the one you would expect to be more accurate.
Projected coordinate reference systems (PCS) flatten a GCS into 2-dimensions representing linear units (e.g., meters) converting the 3-D surface into Easting and Northing (x and y) values in a projected CRS. PCSs are based on Cartesian coordinates with an origin and x and y axes. See the Geocomputation in R section for more information. This can make area computations less complex and therefore faster, but also causing some distortion of the Earth’s surface. There are many PCS’s available to use (see here), some of which preserve area (i.e., equal-area projections).
Before the 2021 switch to S2, {sf} would use a planar geometry engine called GEOS. The package would assume flat, planar coordinates even if you were trying to calculate 3-D, lat-long geometries. This would result in warning messages like:
although coordinates are longitude/latitude, st_intersects assumes that they are planar
However, if your data is in a 2-D PCS, {sf} reverts to the planar GEOS geometry engine, assuming flat coordinates (so make sure you’re using a projection that preserves area).
How do geodesic area computations using a lat-long GCS compare to an equal-area PCS, like Lambert Azimuthal Equal Area?
First we’ll want to project our qld
object using st_transform()
. We can use a proj4string to define a Lambert Azimuthal Equal Area projection (proj=laea) that is centered on Queensland (lat_0=-22 + lon_0=72).
<- qld %>% st_transform(crs = '+proj=laea +x_0=0 +y_0=0 +lat_0=-22 +lon_0=72')
qld_p qld_p
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 6046605 ymin: -3068136 xmax: 7202248 ymax: -180485.7
Projected CRS: +proj=laea +x_0=0 +y_0=0 +lat_0=-22 +lon_0=72
# A tibble: 1 × 2
NAME geometry
* <chr> <MULTIPOLYGON [m]>
1 Queensland (((7151086 -1887972, 7151039 -1892968, 7149304 -1892402, 7151086 -…
st_crs(qld_p)
Coordinate Reference System:
User input: +proj=laea +x_0=0 +y_0=0 +lat_0=-22 +lon_0=72
wkt:
PROJCRS["unknown",
BASEGEOGCRS["unknown",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8901]]],
CONVERSION["unknown",
METHOD["Lambert Azimuthal Equal Area",
ID["EPSG",9820]],
PARAMETER["Latitude of natural origin",-22,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",72,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["False easting",0,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
qtm(qld_p) # visualize it
There are different systems of defining CRSs. “proj4strings” are text strings that describe the CRS. The use of proj4strings, however, is discouraged for a variety of reasons now. See this article for more details. Best practice is now WKT and EPSG codes.
Let’s use an EPSG code instead of a proj4string. These codes were started by a member of the European Petroleum survey Group (EPSG) which was merged into International Association of Oil & Gas Producers (IOGP) and maintained by the IOGP Geomatics Committee.
Lets use https://epsg.io/ to find an appropriate projection for Australia. Let’s use the first option, GDA94 / Geoscience Australia Lambert EPSG:3112.
<- qld %>% st_transform(crs = 'EPSG:3112')
qld_p # note we overwrote the previous qld_p object
qld_p
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 395488.8 ymin: -3414702 xmax: 1923742 ymax: -1117282
Projected CRS: GDA94 / Geoscience Australia Lambert
# A tibble: 1 × 2
NAME geometry
* <chr> <MULTIPOLYGON [m]>
1 Queensland (((1538225 -2422082, 1540411 -2425673, 1538207 -2425987, 1538225 -…
st_crs(qld_p) # get the CRS information
Coordinate Reference System:
User input: EPSG:3112
wkt:
PROJCRS["GDA94 / Geoscience Australia Lambert",
BASEGEOGCRS["GDA94",
DATUM["Geocentric Datum of Australia 1994",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4283]],
CONVERSION["Geoscience Australia Standard National Scale Lambert Projection",
METHOD["Lambert Conic Conformal (2SP)",
ID["EPSG",9802]],
PARAMETER["Latitude of false origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8821]],
PARAMETER["Longitude of false origin",134,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8822]],
PARAMETER["Latitude of 1st standard parallel",-18,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8823]],
PARAMETER["Latitude of 2nd standard parallel",-36,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8824]],
PARAMETER["Easting at false origin",0,
LENGTHUNIT["metre",1],
ID["EPSG",8826]],
PARAMETER["Northing at false origin",0,
LENGTHUNIT["metre",1],
ID["EPSG",8827]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Topographic mapping, environmental studies."],
AREA["Australia - Australian Capital Territory; New South Wales; Northern Territory; Queensland; South Australia; Tasmania; Western Australia; Victoria."],
BBOX[-43.7,112.85,-9.86,153.69]],
ID["EPSG",3112]]
qtm(qld_p) # view the projected data
How do we know from these outputs that we are now in a PCS, not a GCS?
Now let’s calculate the area Queensland using the projected data and the GEOS. Let’s check if S2 is on/off first. It shouldn’t matter though because we are in a PCS and doing a 2-D calculation. Try it both ways though with S2 on and off - does the area of Queensland change?
sf_use_s2()
[1] FALSE
<- st_area(qld_p) %>% units::set_units('km2') # sf uses planar geometry library GEOS
area_laea area_laea
1706921 [km^2]
What is the difference in area of the 2-D calculation compared to the GeographicLib 3-D calculation?
- area_GeoLib area_laea
-22631.55 [km^2]
There is an ~ 22,600 km2 difference in area estimates!
Find another projected CRS that gives a more accurate area of Queensland?
There are many coordinate reference systems (CRS) we can use for geospatial data. There are different models of the Earth (sphere, spheroid, planar) and which model you use will depend on the task at hand.
It is important that when you are using multiple data sets together, that they are all in the same CRS. If something is not working, the first thing you should do is check the CRS.
You may re-use and re-mix the material in any way you wish, without asking permission, provided you cite the original source under the Creative Commons - Attribution-NonCommercial 4.0 International Licence. However, we’d love to hear about what you do with it!
2.7 Comments
We should start with a couple of comments, to document our script. Comments start with
#
, and will be ignored by R: