Billy Lo

Feb 26, 2021

3 min read

Ep. 27 — How to bring the web to our elders?

Photo by Ono Kosuki from Pexels

For seniors, smartphones can be a source of headaches. The text are tiny (even on large screen models), its touch interface requires a lot of dexterity and typing on the soft keyboard can be difficult.

It gets harder if your first language is not English.

As a result, paying bills online or looking up the opening hours of a nearby restaurant, is not as easy as it is for our elders.

Thankfully, this can be improved if we acknowledge their needs. Our population is growing older quickly. By 2030, seniors will make up 23% of the Canadian population.

Source: Statistics Canada (1971–2010) and Office of the Superintendent of Financial Institutions (2020–2080)

How can we help? Here is an example I made using text-to-speech / voice bot technologies to make common things offered on the web more accessible.

Food Buddy” enables the use of Google Maps — Search for Restaurant Details feature without a smartphone or computer. A natural language interface on a phone line replaces Google’s search box.

With the help of friends who speak Mandarin and Hindi, and my better half, Food Buddy can be used in 5 languages. (if you are interested to help out for other language, just comment below. :-))

  • Chinese — Mandarin 普通话 (647–243–7428)
  • Chinese — Cantonese 廣東話 (647–243–7431)
  • Hindi हिन्दी (647–243–6718)
  • Japanese 日本語 (647–243–6720)
  • English (647–243–7430)

The English interface works like this:

Food Buddy: Which restaurant would you like to check out?

Joe: The Pickle Barrel

Food Buddy: The Pickle Barrel, 3.8 stars ⭐, opens at 9, closes at 8 today. Located at 6508 Yonge St, Toronto. <pause> Would you like to be connected to them now?

Joe: Yes.

Food Buddy: <connects Joe to Pickle Barrel for direct conversations>

Everything is done through native language over voice.

The architecture is fairly simple if you want to replicate this pattern for the communities you care about.

  1. VoxImplant accepts call and relay audio to users.
  2. Dialogflow ES handles speech-to-text, intent detection and voice response creation.
  3. A simple webhook (running in Firebase Cloud Function) looks up city level location using the caller id and finds the desired restaurant using Google Places API.
  4. If user would like to be connected immediately, the same webhook would supply the intent and phone number back to VoxImplant to connect the two parties.

That’s it. No hardware to acquire, no visits by telco to set up phone lines, no machine learning code to write… just basic integrations using low cost tools available.

Unexpected things & tech lessons I learned in this experiment?

  1. Google’s speech-to-text is really good and fast, even for recognizing words that do not exist in the dictionary. In my use case, most of the audio are just short snippets with little context and yet, Dialogflow’s TTS engine can recognize unusual phrases such as “Happy Grillmore” and “Baguettaboutit” accurately. The technology is fascinating.
  2. This little tool called Ngrok is handy for testing development stage APIs, e.g webhooks that external servers need to invoke. It allows me to validate the architecture quickly without messing with firewall rules. In an enterprise environment, if we are able to provide strong guardrails to use it securely, it can be an useful accelerator. Early integration means fewer surprises.
  3. Accessibility can open up new paths that are not just good for the community, but also big enough to justify investments needed to build them. In this case, 23% of the population is a sizeable market if you can dominate a particular problem space. The bonus, it’s a less crowded with fewer competitors.

I hope this helps…

Happy building…