Check if Website has changed Example
This example demonstrates a use case for a hypothetical scenario when you need to periodically check if a website has changed. It makes use of workflow variables to keep persistent data to test against.
Most websites would likely have minor changes if you were to just fetch their HTML. These changes are usually stuff like time and session properties generated on each request. We don't want to categorize these changes as true as these are false positives. Luckily many websites have one or both header values: Etag
and Last-Modified
. If either one of these headers changes it is safe to assume that the website has been updated. Note: Although many websites have adopted this standard, there are still websites that have not. These websites are not supported.
This example is fairly simple and can be broken up into four steps: 1. Fetch the current headers of the target website. 2. Get the saved headers variables from the previous execution of this workflow. 3. Compare the current and previous headers to determine if the target website has changed. 4. Save current headers to workflow variables.
Below we'll explain each step in more detail. For this example we've chosen https://docs.direktiv.io
as the target website.
Fetch Website Headers
Here we are performing a HTTP POST
request on the target website using our direktiv/request:v1
direktiv app as a function. Since we only care about the Last-Modified
and Etag
headers we then extract those values and store them in the lastModified
and etag
properties using the transform field in the fetch-site-headers
state. Now that we have fetched the current header values required, the state will transition to the get-old-headers
state.
id: check-website-change
description: "A simple workflow that fetches current headers from a website and compares them to the previously stored headers to determine if it has changed."
functions:
- id: get
image: direktiv/request:v1
type: reusable
states:
- id: fetch-site-headers
type: action
transform: 'jq({lastModified: .return.headers.["Last-Modified"][0], etag: .return.headers.["Etag"][0]})'
transition: get-old-headers
action:
function: get
input:
method: "HEAD"
url: "https://docs.direktiv.io"
Get Old Headers
Direktiv has variables scoped to namespaces, workflows and instance. In this state (get-old-headers
) we'll get the variables lastModified
and etag
in the workflow scope. These variables are set to whatever the header values were last time when this workflow was executed. If this is the first time this workflow is executed these value will be null
. This is fine, but it will cause the results to be siteChanged: true
because the current headers can never equal null
.
- id: get-old-headers
type: getter
transition: check-site
variables:
- key: lastModified
scope: workflow
- key: etag
scope: workflow
Compare Values
Now that we have both the current and previous header values we can make a comparison and check whether the website has changed using the switch state check-site
.
The switch state below has three possible conditions. The first condition is used for validation and will transition
to the error state unsupported-site
if neither etag
or lastModified
was fetched from the current headers. The last two are to check if either of the etag
or lastModified
values have changed between the previous and current headers. If either one of these headers has changed it means that the website has changed and the property siteChanged
is set to true. If none of these conditions are satisfied, the siteChanged
property is set to false because we can assume that no errors/changes have occurred.
- id: check-site
type: switch
defaultTransition: save-values
defaultTransform: 'jq(. += {siteChanged: false})'
conditions:
- condition: 'jq(.etag == null and .lastModified == null)'
transition: unsupported-site
- condition: 'jq(.etag != .var.etag)'
transition: save-values
transform: 'jq(. += {siteChanged: true})'
- condition: 'jq(.lastModified != .var.lastModified)'
transition: save-values
transform: 'jq(. += {siteChanged: true})'
- id: unsupported-site
type: error
error: unsupported.site
message: "https://docs.direktiv.io is not supported: site must respond with atleast one of these headers: ['Etag', 'Last-Modified']"
Save current values
Finally we save the current headers to the lastModified
and etag
workflow variables, so next time this workflow is executed they can be retrieved in the get-old-headers
state.
- id: save-values
type: setter
variables:
- key: lastModified
scope: workflow
value: 'jq(.lastModified)'
- key: etag
scope: workflow
value: 'jq(.etag)'
Sample Output
Note: the getter state will place variables into the var
property. So the var.etag
and var.lastModified
values are the old headers.
{
"etag": "\"60d55d9b-54b1\"",
"lastModified": "Fri, 25 Jun 2021 04:37:47 GMT",
"siteChanged": false,
"var": {
"etag": "\"60d55d9b-54b1\"",
"lastModified": "Fri, 25 Jun 2021 04:37:47 GMT"
}
}
Extra - Converting to a Cron Job
This workflow can currently run as is, and be manually executed. However this example is more than likely to be used as a cron job. To convert this workflow all you need to do is add the start block to the top of the workflow. Below is an example that, if added to the workflow, will run this workflow every once every two hours.
start:
type: scheduled
cron: "0 */2 * * *"
Full Workflow
id: a-cron-example
description: A simple 'action' state that sends a get request"
functions:
- id: get
image: direktiv/request:v1
type: reusable
states:
- id: fetch-site-headers
type: action
transform: 'jq({lastModified: .return.headers.["Last-Modified"][0], etag: .return.headers.["Etag"][0]})'
transition: get-old-headers
action:
function: get
input:
method: "HEAD"
url: "https://docs.direktiv.io"
- id: get-old-headers
type: getter
transition: check-site
variables:
- key: lastModified
scope: workflow
- key: etag
scope: workflow
- id: check-site
type: switch
defaultTransition: save-values
defaultTransform: 'jq(. += {siteChanged: false})'
conditions:
- condition: 'jq(.etag == null and .lastModified == null)'
transition: unsupported-site
- condition: 'jq(.etag != .var.etag)'
transition: save-values
transform: 'jq(. += {siteChanged: true})'
- condition: 'jq(.lastModified != .var.lastModified)'
transition: save-values
transform: 'jq(. += {siteChanged: true})'
- id: unsupported-site
type: error
error: unsupported.site
message: "https://docs.direktiv.io is not supported: site must respond with atleast one of these headers: ['Etag', 'Last-Modified']"
- id: save-values
type: setter
variables:
- key: lastModified
scope: workflow
value: 'jq(.lastModified)'
- key: etag
scope: workflow
value: 'jq(.etag)'