Multilingual NLP with Microsoft Azure Services
It’s quite possible that you might have multilingual input if you have users from different countries, and you collect some input from them in your application. If you do any natural language processing (NLP) in your app, it can be a challenge. Obviously, if you trained your NLP models with data in English, and you try to feed it with some French text, it will fail to analyze it and produce correct results. What should you do it in this case? We will show you a simple solution without any NLP engineering!
We at Nexxt Intelligence mainly use Microsoft Azure services for our infrastructure, and we had an opportunity to work and play with some really fascinating services that Microsoft offers. In this article, we will cover a few language APIs that Azure Cognitive Services has. Cognitive Services also include APIs for :
Decision making
Speech processing
Computer Vision
Web search
In this post, we’ll create a web application that would be able to accept input in different languages and do some sentiment analysis on this data. We‘re going to use React on the client side and Express on the back end.
Firstly, let’s discuss how this idea can be brought into life. As I promised you, we’ll avoid Machine Learning programming, and we will fully rely on Azure Services instead. This solution consists of two main parts:
Normalize data, i.e. translate it into one language. Obviously, it’s going to be English in our case.
Do sentiment analysis.
Both these tasks can be done by Text Analytics API and Translator API from Azure Cognitive Services 🚀
Prerequisites
Microsoft Azure account
Create resources for Text Analytics API (v3.0) and Translator API (v3.0) on Azure Portal
Node.js
Let’s build it!
Now that we have an idea of how we can implement this, let’s set up our project. I’ll split this project into two folders: client
and server
. I initialized a React app using create-react-app in the client
directory and an express app in the server
directory. Also, here is the list of dependencies I’m going to use for this app:
server
express
axios
cors
dotenv
nodemon
client
axios
gestalt (a cute React UI component library)
You can install these dependencies by running npm install DEPENDENCY_NAME
or yarn add DEPENDENCY_NAME
in the corresponding folder (client or server).
How about preparing our back-end first, and then we can build UI for it?
We can start with a simple Express app template that you probably saw many times:
You’ll also need a .env
file with the following content:
You should replace YOUR_URL_GOES_HERE
and YOUR_KEY_GOES_HERE
with the URLs and API keys you got from Azure Portal when you created resource for Text Analytics and Translator APIs.
We’ll need just two endpoints for this app: /translate
and /sentiment
. As you might guess, translate
route would translate text into English, and sentiment
endpoint would return sentiment analysis for input data.
Let’s create translate
endpoint:
The piece of code above takes text property from request body that we would send from the front-end, splits it into an array by \n
delimiter, and creates an array of objects with text property, which is a format Tranlsator API expects.
You might wonder why would we split a string into the array by \n
? We do it in order to ensure that Translator API returns correct translations. Currently, this API is not capable of picking up several languages on the same string object. Even though it’s very unlikely that you would have multilingual text on the same line, we still want to have a little rule for the input data of our application:
if a sentence starts with a different language comparing to the previous sentence, it must be on a new line, so it can be easily identified and translated into English.
Finally, we simply make a POST request to the Translator API with this data, and all required headers and send data back to user.
sentiment
endpoint is next.
It’s pretty similar to /translate
, actually. The main difference is that we have a normalizeData
function here that takes request body and formats it to satisfy the Text Analytics API requirements.
That’s it! Now, we can build client-side of our application.
As I mentioned earlier, we will build UI using Gestalt, a library developed by Pinterest engineers, which has a lot of beautiful components.
We can start with some UI for collecting input from user and send it to our back end when a button is clicked:
Let’s walk through this code. There are four states:
text
which holds the value of theTextArea
input field.overallScore
which is the overall sentiment analysis score for the whole text document.sentences
which contains all translated sentences as well as their sentiment scores.isLoading
which indicates the loading state.
There is also a getSentiment
function that call that sends data from the text to the back end when the button is clicked. By the way, I’m running back-end locally, so my baseUrl
is http://localhost:8080
.
Next, we want to visualize the results that we get from the back end server. Before I do that, I’ll make a few UI components.
ScoreBar
It accepts a score object as a prop, and it will look something like this:
SentenceCard
It accepts an emoji and a sentence as props. Sentence object will contain text and its sentiment scores. As you can see, we also use ScoreBar that we created in the prev. step. Here is how a SentenceCard
will look like:
AnalysisContainer
Finally, let’s integrate all our components into one component and visualize sentiment analysis of user’s text.
This components basically displays overall sentiment score using ScoreBar
, all sentences from user’s input using SentenceCard
s, and there is also a SegmentControl
component, which looks like this:
and allows users to filter sentences.
After this is done, simply import AnalysisContainer
in the App
component, and put in in the end of <Box display=”flex” wrap minWidth={275} justifyContent=”evenly”>
, passing down local state properties as props:
Demo
Final thoughts
In this post, I showed you how to build a simple NLP application that can do sentiment analysis only using cognitive services available on Azure. Definitely, this solution isn’t perfect and may not work well with complex input data. Even when I tested with some simple data, I was not always satisfied with the results as it could show a high positive sentiment score on quite obvious neutral statement. This is one of the reasons why we do a lot of in-house NLP Research & Development at Nexxt Intelligence.
On the other hand, developing this application didn’t even involve any line of NLP-related code! It would be fair to mention that Azure Text Analytics has “ named entity recognition” feature which lets you identify different entities and categorize them, but it’s far beyond the scope of this post. Also, note that it currently supports only 19 languages, and Text Analytics API v3.x is not available in Central India, UAE North, China North 2 and China East.
Source code is available on GitHub:
https://github.com/nexxt-intelligence/multilingual-nlp