Categories
Technology

Amazon Comprehend Modeling & Training Enhancements

Amazon Comprehend extends its services by providing developers the ability to version ML models, leverage specified training set data, and migrate updated models to existing endpoints with recent updates to model management and evaluation enhancements.

Comprehend is a service that provides the ability to specify a data source (often within an S3 Bucket), and extract meaning and insights from identified entities specific to a domain without the need to have years of experience or deep knowledge of NLP and ML.

You can leverage Comprehend to identify the text language, key phrases, places, things, people, even popular celebrities! Developers use this service to derive insights from customer feedback and provide an overall improved experience for users & customers.

These updates to AWS Comprehend have been launched within a suite of tools/updates called Comprehend Custom and include:

– Improved Model Management

  • Model Versioning: Re-train newer versions of an existing ML model. Each version will have a unique version id identifier.
  • Active Endpoint Update: Ensures the ability to deploy a new model version into your production environment with no downtime.

– Improved Control for Model Training/Evaluation

  • Customer Provided Test Dataset: You can optionally provide test data during model training.

AWS Kendra can be used in conjunction with Comprehend; for example Comprehend can take a large document (text file within your S3 Bucket) and output a file that Kendra can consume. Kendra is a service that allows for natural language processing (NLP) of documents and allows for querying vast amounts of enterprise-related documents.

Here I am walking through the steps of creating an Analysis job with AWS Comprehend.

* A new update allows developers to analyze PDF and Word documents for custom entity recognition as you can see in the notification in the screenshot above.

Within the AWS console you can create an Analysis Job that will allow Comprehend to pick out certain elements within a document like over sentiment (positive, negative, neutral) within the document, extracted names, date, quantities, products, organizations etc.).You will need to provide information about where the document lives (located within an S3 Bucket for instance), and where the Comprehend output file will be placed (could be within the same S3 Bucket under a separate folder within your S3 Bucket.

Once your Analysis Job has completed, your Comprehend output file will be in the S3 Bucket/Folder (location) that you previously specified.

Once completed, click on your Analysis Job for more details about your job and where to locate the Comprehend output file.

The only can-be tricky part following here is converting your Comprehend output file (that you can see in the screenshot above is gzip formatted). You can download the file directly from the console, unzip it locally with a tool (I use WinZip), and convert the Comprehend output file with a Python converter tool.

Once you have converted your Comprehend output file and you want to use it as a data/metadata source with Kendra, you will first need to create an index:

Next, you will need to connect and sync to a data source, which includes a number of options – S3 Bucket, ServiceNow, Google Drive, Salesforce, and many more.

Once you’ve synced your index’s data source, you can start querying away, asking things like (given your document has information related to it), “What holidays does our company take off?” or “Where is the library located?” – natural language questions that Kendra is able to help answer.

Resources mentioned in this article:

Dale Yarborough

By Dale Yarborough

I am a Software Engineer at General Motors and Appalachian State University Alum. Previously: Whole Foods Market IT, Charles Schwab