Language Flashcards from Songs and Movies - Part 3: Interacting with SQS

Derek Ferguson
Sep 29, 2019
6 min read

My (admittedly loose) architectural vision at this point is that I want another self-contained service that de-dupes words and another one that takes de-duped words and assesses them against a list of word frequency, so we can understand how common or rare these words are - maybe so future users can skip words they likely already know (or, at the opposite extreme, words that are too rare to be worth learning).

At first, it seems like these functions all occur sequentially across the system, but I'd like the ability to add in additional processes or drop processes out without having to re-code my other services. So, I've elected to connect across an SQS queue where every service will get every message and be the determiner of its own fate as to which messages it will handle and which it won't.

I considered creating the queue and connecting it to my service manually, but I have a design goal in this system of being able to re-deploy the entire thing to any AWS region or account "from scratch" simply by changing the environment variables. As a result, I have elected to add a script called "installer.py" to my source code, which uses the BOTO3 library to automate the creation of the queue and its connection to the service. I would have used the "lambda python" library, but it doesn't support this function yet. Having said that, it actually wraps boto3 itself, so - perhaps I can harvest this code for a contribution to that OSS library later.

The coding is pretty straightforward and follows this flow...

The only catch comes at the very end, when I am told that the role I created in the previous article in this series needs a few more permissions added. The AWS documentation actually warned me about this. I will add these through the UI (though I realize that is cheating a bit - I realize now that I didn't script any of the IAM stuff... will have to live with that small bit of technical debt for the moment :-) )

The 3 permissions needed are contained in the policy called "AWSLambdaSQSQueueExecutionRole", so I add that through the console. It now looks like this...

Re-running my code does not generate an error and reviewing my Lambda function now shows SQS attachments, so I believe this worked.

I try running my code again to make sure the clauses I put in to avoid attempting to re-create stuff that already exists have the effect I intended and I don't get these doubled-up. Seems to work fine, though I will have to revisit this in the future to add an "update" path for pre-existing queues, instead of just not doing anything at that point. :-)

I commit my new code and push it up, to make sure it works equally fine on the server. And it all works perfectly! :-)

Returning to this after a full work week, I realize that this is not working quite how I thought it was. Running the unit tests on my desktop now complains about not having the right credentials for AWS. What's going on?

Well, I had put the code to create SQS directly into the body of the installer.py script. That means that it got run as soon as it was imported into the tests.py script. In actuality, I only want it to be run if-and-when the installer.py script is explicitly run. So, this means I need to make 2 changes. First - I need to move all the body code in installer.py into a main method. Then, I need to add an explicit execution of that script into bitbucket-pipelines.yml.

Everything works as planned. I realize that, being a Mac user, I have broken my local unit tests a while ago by not including the Mac binary of mystem, though. So, I take a moment to pause and add some code to use the Mac version of the binary, if we are on Mac.

Finally, I momentarily change one of the unit tests to include an extra form of the same verb, because I suspect that the lemmatizer is also de-duping. But... it does not. So, our next service will be quite simple. It will take a list of lemmas with duplicates and return a list of lemmas without duplicates.

So, for starters, I want to change the existing service to write the list to the SQS queue. Inspecting the service I have already written, I see that it has an array and then concatenates it into a string separated by spaces to return it. Since we want to de-dupe... it seems like we're better off not doing that concatenation. In fact, as I think about it, I feel like de-duping is not a sufficiently large operation as to warrant a separate service for this purpose. I add a unit test to verify that doubles are removed, watch it fail, a the line "tokens = list(set(tokens))" to the end of my function and re-run the unit test to see it succeed. Piece of cake! :-)

The only thing left to do is to get this function putting its de-duped list onto an SQS queue, so that whatever other services are interested in performing further processing on it can accomplish that.

Step #1 - we have to update the config.yaml to have a name for our lemmas queue and to have code to create that queue, if it doesn't already exist. I change the yaml to rename the section from "queue" to "queues" and re-run the tests and they don't break. I don't like this. Obviously, I created tests that aren't sufficiently deep. So I adjust my test, see it fail, and then create a similarly-deep test to verify that it can find the name of my new queue, too.

Deploying up to the CI environment demonstrates that I need to make one additional adjustment. Adding the MacOS version of the mystem binary has made my deployment too large to go onto Lambda. This is easily remedied, thiough - as I don't need (or want) the MacOS binary on Lambda. So, I add a line to my "installer.py" script to delete the MacOS binary before the bundling takes place. Deployment fixed - and a quick look at the Lambda console now confirms that our new queue has been created!

To write to it, we need the URL. So, I add code to installer.py to retrieve the URL. How to get it to the Lambda function, though? I add it to the nested dictionary I read in from config.yaml, and then overwrite config.yaml by writing the updated, nested dictionary back out. I'm surprised to see it run the first time without errors. I'm even more surprised to look at the Lambda function in the AWS console and see that the environment variable HAS been added!

Last but not least, let's have our service.py file write the dictionary of lemmas out to the new SQS queue before it ends!

Step #1 -- connecting it to the SQS queue. The trick here is to have it write to the SQS queue when it is in Lambda, but not when it is running the unit tests - either on the server or the developer desktop. I settle on the presence of the environment variable above as the sign that it is on Lambda. This trick works with the local unit tests - they skip trying to write to the queue. Also works with the remote unit tests. Everything deploys fine.

Trying Lambda, I am told that the only kind of payload I can send to SQS is a string. This doesn't surprise me. I go back and serialize the payload to JSON and try again. Now I get an access error. :-(

I recall having seen an article that pointed out that one has to explicitly grant access to write message to SQS, and I don't remember that being on the list for this role before, so I check it. I add SQSFullAccess to my LambdaBasicExecution role and re-try. The invocation succeeds this time, but is there a message on the queue? I run it a couple more times to first look at my latency on re-execution... seems about 200ms - probably fine for this application. Then I check out the SQS queues.

3 messages in my lemmas queue... we're fully-connected now! Time to work on the second service - the one that will assess the complexity of the words and provide their English translations.

kNative from Scratch - a failed attempt

Recording multi-track TD-50 Drums with Pro Tools

Notes on GPT-2 (345M) use with custom text

Language Flashcards from Songs and Movies - Part 3: Interacting with SQS

Comments