autoLabel by Samsung SDSA

July 14, 2021 | Patrick Bangert

Now, you can fully streamline your AI workflow by labeling only the data that provide valuable information and by using an arbitrarily large hardware infrastructure to execute it – all of this comes in an easy-to-use environment powered by Samsung SDSA’s autoLabel and RedBrickAI.

Key benefits

Improve productivity

  • Automate 80% - 90% of manual data labeling effort
  • Improve time-to-training 75% compared to manual labeling

Improve processes

  • High performance web-based tools to label and validate your training data.
  • Enable seamless collaboration between multiple stakeholders, and human-machine interaction for seamless Active Learning iterations and gain visibility into the progress of your labeling projects.

Deliver performance

  • Let uncertainty estimation prioritize images containing the most information to improve
  • New state-of-the-art labeling accuracy

Reduce Total-Cost-of-Ownership

  • Reduce active learning iterations and cloud infrastructure costs 33%
  • Save 5% - 15% TCO

Challenges in Data Science and AI Workflows

AI is a fiercely competitive field evolving at great speed. Reducing the time to start training AI models helps your business get to market faster than the competition and can often be the deciding factor between commercial success and failure. It is well known that about 80% of the total workload of a data science or AI project lies in producing a clean, labeled data set. In addition, enterprise data is expected to grow at a rate of 30% for each of the next 7 years, and dataset sizes are expected to grow, as well.

The most common and simplest approach to data labeling is, of course, a fully manual one, so humans really are the bottleneck in AI development. Based on empirical evidence, it takes approximately 6 seconds for a human to select a classification label from a list in order to label an object (classification), 10 seconds to draw a bounding box around an object and select a label from a list (detection), and 1 minute to draw an outline around an object and select a label from a list (segmentation). Considering a typical dataset of 100,000 images with 7 objects per image, it would take around 5,833 man-hours = 100,000 images × 7 objects/image × 3 humans × 0.17 labels/min ÷ 60 min/hr to add detection labels, manually. Based on AWS Mechanical Turk suggested pricing, manual labeling costs would be equivalent to spending around ~$75K = 100,000 images × 7 objects/image × 3 humans × $0.036/object-label.

png

Once a data point is labeled, labeling similar data points costs the same amount of human effort but provides only marginal additional information to the AI learning process. Empirically, the first 10% - 20% of data points provide (nearly) all the information present in the dataset, while processing the remaining 80% - 90% requires work with significantly diminishing returns.

The solution

Now, you can fully streamline your AI workflow by labeling only the data that provide valuable information and by using an arbitrarily large hardware infrastructure to execute it – all of this comes in an easy-to-use environment powered by Samsung SDSA’s Brightics AI Accelerator and RedBrickAI.

Let’s look at the main benefits and features…

Improve productivity

The autoLabel feature in Samsung SDSA’s Brightics AI Accelerator (AIA) is a human-in-the-loop, active learning system which sorts data by uncertainty, and human labelers start by labeling a small, pre-sorted portion of the data. The autoLabel system trains a model on the labeled data and applies it to the remaining unlabeled data in order of confidence. Considering a dataset of 100,000 images, 1,000 images about which the model is least confident forms the next batch of candidates to label, manually. After 5% - 16% of the data is labeled manually, the confidence is typically so high that no further labeling needs to happen, and the remaining dataset can be labeled automatically by the system to reduce human labeling 84% - 95%. These automatically labeled images can now be checked by human domain experts with far less effort than creating the labels themselves.

png

Labeling Process:

  • Manually label 1% of the data or 1,000 candidate images
  • Train a model on the labeled data and run it at inference on the full dataset
  • Sort labels by uncertainty estimate – most uncertain at the top become new candidates
  • Lather, rinse, repeat
  • Stop once the uncertainty falls below threshold
  • Manually check the automatically labeled data
  • Only 5% - 16% of all data needs to be labeled; the rest can be auto-labeled
  • Example: 16% labeled data leads to maximum model accuracy – no improvement thereafter. See the results in Figure 1.

AutoLabeling the same dataset of 100,000 images with 7 objects per image would only take 29 computehours plus 1,925 man-hours to add detection labels and would be equivalent to spending around ~$6K for autoLabeling + $5.5K for cloud instance costs + $8.8K 3rd party manual labeling costs for a total cost of ownership (TCO) of $19,816. The total 1,954 human and compute hours are 66% less than labeling all the dataset manually.

png

Figure 1. As the dataset is being labeled 1% at a time, the autoLabel feature keeps the unlabeled data sorted, so the most informative data is labeled first. We clearly see this approach achieves a significantly faster convergence to the highest achievable accuracy – here at 6% – as compared to the normal labeling of data in a random order, and compared to regular active learning found in other AI platforms.

Improve Processes

As the complexity and scale of datasets increase, the processes surrounding labeling datasets also need to evolve. You need a comprehensive toolset to accelerate labeling projects, reduce cost, and maintain high quality output. Structure, automate and qualify your labeling workflows by using the RedBrick AI platform. By building a completely custom labeling workflow, your team can easily carry out autoLabel active learning iterations. Easily leverage this automation through Samsung SDSA’s Brightics AI Accelerator solution, which allows AI teams to devote 100% of their attention to science rather than software infrastructure development and maintenance. The orchestration, management, and cleanup of the infrastructure of hundreds of GPUs is fully automated and as easy to use and deploy as doing work on your desktop. Gain and maintain visibility into the productivity of your workforce, and track how autoLabel automates the labeling over time. Your labeling tasks will get automatically routed in the Active Learning workflow, and assigned to the appropriate stakeholders — simplify the project management and focus on your science.

Improve performance

The rapid pace of AI innovation makes designing and training accurate AI models challenging. With Brightics AI Accelerator, you can eliminate guesswork and get started faster by automating the initial tasks of labeling data. This will get you most of the way to optimal accuracy. After that, you fine-tune your model by running multiple experiments in parallel before scaling up to exploit large, distributed compute clusters.

Reduce Total Cost of Ownership

Compared to manually labeling all of the dataset, autoLabel reduces customer Total-Cost-of-Ownership (TCO) 54% in addition to reducing the time it takes to start labeling 75%. Because the Brightics autoLabel solution pre-processes the dataset in order of confusion before the first and successive manual labeling iterations, it is able to exploit the most informative data to reduce active learning iterations and cloud infrastructure costs 33% over competing cloud offerings.

Pricing

Customers pay for the number of images labeled in addition to cloud training and inference instance costs.

png

Example Pricing:

For applying object detection labels to a dataset of 100,000 images, we estimate 11 autoLabel iterations, each consisting of 1% of the dataset, which would require manual labeling of 11,000 images. The autoLabel system will apply labels to the entire dataset of 100,000 images, automatically. The autoLabel SaaS price estimate would be 50,000 * $0.135 + 50,000 * $0.095 = $11,500, and the end customer would also pay 11,000 images * 7 labels/image * $0.036/label * 3 labelers = $8,316 in estimated manual labeling costs. The customer TCO would be $19,816 = $11,500 for autoLabel SaaS + $8,316 for 3rd party manual labeling assuming that each manually labeled image is separately labeled by 3 human labelers to disambiguate labeling quality. Please, note that manual labeling is not part of this offering, and the price for a human-generated label is stated here only for comparative purposes.

Samsung SDSA and RedBrick AI: Driving Innovation Together

The Brightics AI Accelerator platform automates data labeling, machine learning and running models at inference in production. At the heart of the Brightics AI Accelerator is an intelligent and flexible automated AI model training and inference platform that reduces complex AI model training orchestration down to a single line of code. Brightics AI Accelerator is use case agnostic and covers training all AI models by applying autoML to tabular, CSV, time-series, image, or natural language data to enable analytics, computer vision classification, detection, and segmentation, and NLP use cases. Our distributed clustering technology powers AI models to run at inference efficiently in the cloud and on-premise. Our partners enable models trained by AI Accelerator to run fast and accurate on constrained edge devices such as FPGAs or Raspberry Pi’s. Brightics AI Accelerator platform can be applied horizontally across industry verticals including but not limited to Health Care, Retail, Automotive, Aerospace, Communications, Finance, Marketing, Manufacturing, any industry using IoT, or academia for fundamental science.

Solution Components

  • Samsung SDSA’s autoLabel API
  • RedBrickAI’s Labeling UI and management tools

Reference Assets

Samsung SDSA has released the following reference assets based on computer vision and natural language processing (NLP) targeted use cases:

AutoLabel: https://www.youtube.com/watch?v=wcP1fRPKXSU&list=PLWF74wNhLmvrFQJ1i8uFjOg8wK_QTlxKO

Case Study on COVID-19 detection: https://www.youtube.com/watch?v=tfj_25FvvPs

About Samsung SDS America

Samsung SDS is a global leader in enterprise AI, digital transformation, AI transformation, and innovation solutions. Scientific consulting services from Samsung SDS America can help you transform your business and customer engagement workflows to capitalize on the promise and potential of AI. Learn more at: www.samsungsds.com

About RedBrick

RedBrick AI is a software platform for creating and managing computer vision training data. Teams use the RedBrick AI platform to structure, automate, and qualify their labeling efforts. Learn more at www.redbrickai.com.


  • RSS
  • Samsung SDS
  • Samsung SDS LinkedIn
  • Samsung SDS Youtube
  • Samsung SDS Slideshare

Copyright ©2019 SAMSUNG SDS America, Inc. All rights reserved.