Lamda, Go, and GotSport via API Gateway

While I've been primarily living in the mobile world recently, I've been intrigued by a the developments in a few other areas of technology.

Go, or golang, has been increasingly becoming my preferred language for side projects.  I really like the concise nature of the language, the ability to deploy on multiple platforms, and the short tool chain.  It 'just works', and is fun to program in.

AWS continues to add interesting services and features.  I recently moved my website from GoDaddy to host on S3.  This really started my thinking about living in a 'serverless world'.  While practically speaking hosting on S3 isn't really all that different than virtual hosting at GoDaddy, it is really just scratching the surface.  You can now build pretty interesting applications without running a server (virtual or otherwise).

I begin to think about how you could build a full application using the AWS stack without an EC2 instance.  Of course, a ton of thought has already been put into this.  The Serverless tool allows you to easily configure Amazon to use the API Gateway with Lambda to deploy functional APIs.  There is also a website dedicated to living serverless.

I've also been working on a few Amazon Alexa applications for the Echo, which also uses Lambda as the preferred deployment program.

So I thought it was time to build something that actually works.

My son plays soccer and the Colorado Soccer Association uses the event.gotsport.com website to post schedules and results.  So I built a screen scraper in Go to parse his schedule into JSON so I could use it in my Amazon Alexa application.  (The scraper is here: https://github.com/ericdaugherty/gotsport-scraper)  But I hard-coded the resulting JSON into that application.

I figured I could build a general API that could be used by anyone to convert the schedules into JSON.  And I could easily deploy it on Lambda.

In order to get Go running on Lambda, you need to 'work around' the fact that it isn't officially supported.  The lambda_proc library (https://github.com/jasonmoo/lambda_proc) does just that.  It uses a node.js wrapper that invokes your Go application within the Lambda runtime.  The repository has a good example that should you how to write and deploy a go app on Lambda.

From there, I just needed a simple Go app to take the input JSON, run the gotsport-scraper I wrote, and return the resulting JSON.

The final step was exposing the Lambda function as an HTTP API.  This is where the API Gateway comes in.  It allows you to specify an endpoint and method to trigger the Lambda function.  The basics are pretty straightforward.  You define a resource (/gotsport) and a method (GET), and map it to your Lambda function.  However, the tricky part is the mapping of the HTTP Request to the Lambda function, and the result of the Lambda function to the HTTP Response.

Here is the full lifecycle:



I decided to use Query String parameters to pass data to the function, so you could just cut/paste from the URL you wanted converted.  You can define the query strings in the Method Request section, but this just seems to allow you to use them when testing and isn't required.  I did have to map the query parameters into the Lambda, so I created an application/json mapping (since the Lambda function consumes JSON).  The mapping function is:
{
    "eventId" : "$input.params('EventID')",
    "groupId" : "$input.params('GroupID')",
    "gender"  : "$input.params('Gender')",
    "age"     : "$input.params('Age')"
}

This maps the Query String Parameters using their names (again, as used on gotsport.com so you can cut and paste) into JSON values that match those used by me gotsport-scraper tool.  This is then passed to the Lambda function.

The Lambda function runs, fetching the requested URL, scraping it, and returning a JSON value.  However, the lambda_proc function returns both an error value and a data value containing the results of the Lambda function.  I wanted to map the output to just contain the JSON representing the schedule.  So in the Integration Response step in the lifecycle, I used the application/json mapping function:
#set($inputRoot = $input.path('$'))
$input.json('$.data')
This just extracts the data element from the JSON returned from the Lambda function and passes it back as the HTTP Response Body.

The proper approach in the Integration Response step is to use a RegEx to determine if an error occurred or not, and return the proper HTTP response code and appropriate body.  For now I'm assuming a 200 response with valid data.

That's it!  I now have an HTML->JSON screen scraper for the GotSport website deployed as an API.

Want to see how this all works.  Here are the resources:

Want to test it out?  Grab a specific schedule from the CSA Youth Advanced League 2016 site.  Then cut/paste the query string parameters onto my API URL: https://j4p9lh1dlb.execute-api.us-east-1.amazonaws.com/prod/gotsport  For example, the U-11 Boys Super League would be: https://j4p9lh1dlb.execute-api.us-east-1.amazonaws.com/prod/gotsport?EventID=46461&GroupID=511424&Gender=Boys&Age=11



1 comment:

  1. That's pretty awesome Eric! I'm sort of looking to do the same thing for a portal that I'm creating for my final class in BS Web Development, but I don't want to use Lambda or AWS and I'm trying to scrape the scores not the schedules. I'd like to stick to PHP, JSON and/or JQuery. Do you have any advise? You can take a look at what I have here: ummhasan.com/kbellw02 which is only links to scores for each Michigan League. If I can get the actual scores to show for each respective league, I plan to purchase a domain for this portal and make is available to the public.

    ReplyDelete