Multilingual NLP with Microsoft Azure Services
 

It’s quite possible that you might have multilingual input if you have users from different countries, and you collect some input from them in your application. If you do any natural language processing (NLP) in your app, it can be a challenge. Obviously, if you trained your NLP models with data in English, and you try to feed it with some French text, it will fail to analyze it and produce correct results. What should you do it in this case? We will show you a simple solution without any NLP engineering!

We at Nexxt Intelligence mainly use Microsoft Azure services for our infrastructure, and we had an opportunity to work and play with some really fascinating services that Microsoft offers. In this article, we will cover a few language APIs that Azure Cognitive Services has. Cognitive Services also include APIs for :

  • Decision making

  • Speech processing

  • Computer Vision

  • Web search

In this post, we’ll create a web application that would be able to accept input in different languages and do some sentiment analysis on this data. We‘re going to use React on the client side and Express on the back end.

Firstly, let’s discuss how this idea can be brought into life. As I promised you, we’ll avoid Machine Learning programming, and we will fully rely on Azure Services instead. This solution consists of two main parts:

  1. Normalize data, i.e. translate it into one language. Obviously, it’s going to be English in our case.

  2. Do sentiment analysis.

Both these tasks can be done by Text Analytics API and Translator API from Azure Cognitive Services 🚀

Prerequisites

Let’s build it!

Now that we have an idea of how we can implement this, let’s set up our project. I’ll split this project into two folders: client and server. I initialized a React app using create-react-app in the client directory and an express app in the server directory. Also, here is the list of dependencies I’m going to use for this app:

server

  • express

  • axios

  • cors

  • dotenv

  • nodemon

client

  • axios

  • gestalt (a cute React UI component library)

You can install these dependencies by running npm install DEPENDENCY_NAME or yarn add DEPENDENCY_NAME in the corresponding folder (client or server).

How about preparing our back-end first, and then we can build UI for it?

We can start with a simple Express app template that you probably saw many times:

You’ll also need a .env file with the following content:

You should replace YOUR_URL_GOES_HERE and YOUR_KEY_GOES_HERE with the URLs and API keys you got from Azure Portal when you created resource for Text Analytics and Translator APIs.

We’ll need just two endpoints for this app: /translate and /sentiment. As you might guess, translate route would translate text into English, and sentiment endpoint would return sentiment analysis for input data.

Let’s create translate endpoint:

The piece of code above takes text property from request body that we would send from the front-end, splits it into an array by \n delimiter, and creates an array of objects with text property, which is a format Tranlsator API expects.

You might wonder why would we split a string into the array by \n? We do it in order to ensure that Translator API returns correct translations. Currently, this API is not capable of picking up several languages on the same string object. Even though it’s very unlikely that you would have multilingual text on the same line, we still want to have a little rule for the input data of our application:
if a sentence starts with a different language comparing to the previous sentence, it must be on a new line, so it can be easily identified and translated into English.

Finally, we simply make a POST request to the Translator API with this data, and all required headers and send data back to user.

sentiment endpoint is next.

It’s pretty similar to /translate, actually. The main difference is that we have a normalizeData function here that takes request body and formats it to satisfy the Text Analytics API requirements.

That’s it! Now, we can build client-side of our application.

As I mentioned earlier, we will build UI using Gestalt, a library developed by Pinterest engineers, which has a lot of beautiful components.

We can start with some UI for collecting input from user and send it to our back end when a button is clicked:

Let’s walk through this code. There are four states:

  • text which holds the value of the TextArea input field.

  • overallScore which is the overall sentiment analysis score for the whole text document.

  • sentences which contains all translated sentences as well as their sentiment scores.

  • isLoading which indicates the loading state.

There is also a getSentiment function that call that sends data from the text to the back end when the button is clicked. By the way, I’m running back-end locally, so my baseUrl is http://localhost:8080.

Next, we want to visualize the results that we get from the back end server. Before I do that, I’ll make a few UI components.

ScoreBar

It accepts a score object as a prop, and it will look something like this:

1_KXRuvghqVJqgQsUitR7ygg.png

SentenceCard

It accepts an emoji and a sentence as props. Sentence object will contain text and its sentiment scores. As you can see, we also use ScoreBar that we created in the prev. step. Here is how a SentenceCard will look like:

1_0iaWmnd48WAu2nkA8XR-WA.png

AnalysisContainer

Finally, let’s integrate all our components into one component and visualize sentiment analysis of user’s text.

This components basically displays overall sentiment score using ScoreBar, all sentences from user’s input using SentenceCards, and there is also a SegmentControl component, which looks like this:

and allows users to filter sentences.

After this is done, simply import AnalysisContainer in the App component, and put in in the end of <Box display=”flex” wrap minWidth={275} justifyContent=”evenly”>, passing down local state properties as props:

<AnalysisContainer
    isLoading=
    overallScore=
    sentences=
/>

Demo

Final thoughts

In this post, I showed you how to build a simple NLP application that can do sentiment analysis only using cognitive services available on Azure. Definitely, this solution isn’t perfect and may not work well with complex input data. Even when I tested with some simple data, I was not always satisfied with the results as it could show a high positive sentiment score on quite obvious neutral statement. This is one of the reasons why we do a lot of in-house NLP Research & Development at Nexxt Intelligence.

On the other hand, developing this application didn’t even involve any line of NLP-related code! It would be fair to mention that Azure Text Analytics has “ named entity recognition” feature which lets you identify different entities and categorize them, but it’s far beyond the scope of this post. Also, note that it currently supports only 19 languages, and Text Analytics API v3.x is not available in Central India, UAE North, China North 2 and China East.


Source code is available on GitHub:

https://github.com/nexxt-intelligence/multilingual-nlp