AI Transformation Series #1: New Organizational Structures for Training Data in 2022

Training Data, the art of supervising machines through data

Training Data is the art of supervising machines through annotated data.

Why am I making a distinction between Training Data and Data Science?

Because there are differences between day to day responsibilities of people producing Training Data and Data Science people consuming it.

How do we better organize and manage this?

One way to approach this is to structure new roles around Training Data specific focus.

Why is New Structure needed?

First, bad data will mean bad AI. Bad AI means wasted investment in AI. This upstream role is so central to the success of AI projects that it must be given an appropriate role.

  1. Bad data = bad AI
  2. AI Transformation Leader needed
  3. Army of Annotators

Director of Training Data

First, it is most ideal to have a Director of Training Data position established.

Example Reporting Relationship
  1. Director of Training Data
  2. Training Data Evangelist
  3. Training Data Production Manager(s)
  4. Annotation Producer
  5. Data Engineering

Director of Training Data Responsibilities

Chiefly this person is responsible for overall Production of Training Data.

  1. Turning line of business needs into successfully produced training data.
  2. Generating work for training data production by mapping business needs to Training Data concepts.
  3. Managing a team of Production Managers who facilitate the day to day Annotation Production.
  4. Managing Evangelists who work with Line of Business managers to identify Training Data and AI Opportunities. Especially feasibility concerns regarding annotation. Of the various ideas proposed by say a line manager, only a handful may actually be cost-effective at that moment in time to annotate.
  5. Managing the Training Data Platform.
  1. Coordinate closely with data science to ensure the data produce is being consumed as expected.
  2. Indirectly, to act as a check and balance on Data Science, acting as a supervision on the business results and output of data science that goes beyond purely quantitative statistics.
  3. Normal director level responsibilities, potentially some kind of Profit and Loss responsibility, KPIs, supplier and vendor relationships, reporting — example reporting relationship shown below.

Training Data Evangelist

Educator, Trainer, Change Agent

  1. Works closely with Line of Business managers to identify key Training Data and AI Opportunities.
  2. Works “ahead” of Production managers, establishing the upcoming work and acting as the glue between the line of business managers and the production managers.
  1. Educating people on the best usages of modern supervised learning practice.
  1. Educating people in the organization on the effects of AI transformation. In practical terms converting interest to actionable annotation projects.
  2. To recruit annotators from that line of business. On a practical level, this would be about converting an employee doing regular work, into someone who say as 20% of their job, is capturing their work in an annotation system.
  3. Training. In the context of part-time annotators especially, this person is responsible to explain how to use tools and troubleshooting issues. This is distinct from the Production Manager’s who are more geared towards training full-time annotators. This is because you train a Doctor differently than you train an entry level employee.

Training Data Production Manager(s)

This person is chiefly concerned with being a taskmaster for actually getting annotation work completed.

  1. Interfacing with Data Science to setup the Schemas, setting up the tasks and workflow UIs, doing the Admin management of training data tooling (generally non-technical), general annotation tooling expert.
  2. Training Annotators
  3. Managing Day to Day Annotation Processes
  4. In some cases basic data loading and unloading can also be done by this person.
  5. When in change discussion, this person is responsible to explain the reasonableness of annotation work to people new to the matter.
  6. Using Data Curation tooling.

Annotation Producer

Annotation producers usually fall within two buckets:

  1. Full time, dedicated and trained people. This may be newly hired people or re-assignment from existing work

Data Engineering

  1. Responsible for getting the data loaded and unloaded, the technical aspects of training data tools, pipelines setup, pre-labeling, etc.
  2. Especially organizing getting data from various sources including internal teams
  3. Planning and architecting setup for new data elements
  4. Organizing integrations to understand the technical nuances of capturing Annotations
  1. Are you looking for a training data platform? Also called data labeling platform or data annotation.
  2. Working at a large firm? Consider Enterprise Training Data Platform.
  3. Did you know? Diffgram is an Open Source Training Data Platform. That includes artificial intelligence image annotation, machine learning segmentation, and more.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store