AI Transformation Series #1: New Organizational Structures for Training Data in 2022

Anthony Sarkis
6 min readDec 2, 2021

New: Learn “What is Data Labeling!”

Beyond Digital Transformation, this is the start of the AI Transformation era.

This is the ticket for business leaders who want to transform their company over the next decade to be an AI first company. To build an AI first corporate culture.

It can start today with you, the leaders. There’s low-hanging fruit that you’re going to see results in a matter of months or years with the overall wave coming over the decade.

Training Data, the art of supervising machines through data

Training Data is the art of supervising machines through annotated data.

Because Training Data work is upstream to Data Science, failures in Training Data flow down to Data Science.

Therefore, it’s important to get Training Data right.

Why am I making a distinction between Training Data and Data Science?

Because there are differences between day to day responsibilities of people producing Training Data and Data Science people consuming it.

I think of it as a Producer and Consumer relationship.

How do we better organize and manage this?

One way to approach this is to structure new roles around Training Data specific focus.

The following is a synthesis of the structures I have seen at a variety of companies, in addition to my own views based on prior experience setting up new teams for new technology.

Naturally you will need to choose the exact right structure for your company — this is simply meant to put some visibility on the overall concept of the need for this new structure.

Why is New Structure needed?

First, bad data will mean bad AI. Bad AI means wasted investment in AI. This upstream role is so central to the success of AI projects that it must be given an appropriate role.

Second, as part of the goal of AI transformation there must be a principally responsible individual to lead the charge. While a VP or CEO can also play this role at the strategic level the Director is responsible to execute this strategy.

Third, as the volume of people involved balloons the simple reality is that this is a team of teams, and with people of many distinct characteristics. Even a most minimal team will likely have at least one or two production managers, and twenty to fifty plus Annotation Producers.

This can easily grow to hundreds of people. In a very large organization there may be hundreds of even thousands of part-time annotation producers.

It’s an army of people to be managed.

To recap:

  1. Bad data = bad AI
  2. AI Transformation Leader needed
  3. Army of Annotators

Director of Training Data

First, it is most ideal to have a Director of Training Data position established.

This person can for example report to the VP of AI, or VP of Engineering, or CTO etc. Even if this role is baked into some type of Director of AI role, the themes stand.

The following roles illustrate both the Director’s responsibilities and sample descriptions of the key team member roles. Please note these are not meant to be complete job descriptions just highlighting some of the key structural elements of the role.

Example Reporting Relationship

Here I will cover:

  1. Director of Training Data
  2. Training Data Evangelist
  3. Training Data Production Manager(s)
  4. Annotation Producer
  5. Data Engineering

Let’s dive in!

Director of Training Data Responsibilities

Chiefly this person is responsible for overall Production of Training Data.

This includes:

  1. Turning line of business needs into successfully produced training data.
  2. Generating work for training data production by mapping business needs to Training Data concepts.
  3. Managing a team of Production Managers who facilitate the day to day Annotation Production.
  4. Managing Evangelists who work with Line of Business managers to identify Training Data and AI Opportunities. Especially feasibility concerns regarding annotation. Of the various ideas proposed by say a line manager, only a handful may actually be cost-effective at that moment in time to annotate.
  5. Managing the Training Data Platform.

Besides general efficiency and visibility into annotation work, this person must map the productivity in annotation back to the business use case. If possible the Evangelist may do this too, with the Director being the second line.

And:

  1. Coordinate closely with data science to ensure the data produce is being consumed as expected.
  2. Indirectly, to act as a check and balance on Data Science, acting as a supervision on the business results and output of data science that goes beyond purely quantitative statistics.
  3. Normal director level responsibilities, potentially some kind of Profit and Loss responsibility, KPIs, supplier and vendor relationships, reporting — example reporting relationship shown below.

Naturally the director can fill in for most any of the below roles as needed.

Training Data Evangelist

Educator, Trainer, Change Agent

  1. Works closely with Line of Business managers to identify key Training Data and AI Opportunities.
  2. Works “ahead” of Production managers, establishing the upcoming work and acting as the glue between the line of business managers and the production managers.

In a company focused exclusively on AI products:

  1. Educating people on the best usages of modern supervised learning practice.

In a classic business

  1. Educating people in the organization on the effects of AI transformation. In practical terms converting interest to actionable annotation projects.
  2. To recruit annotators from that line of business. On a practical level, this would be about converting an employee doing regular work, into someone who say as 20% of their job, is capturing their work in an annotation system.
  3. Training. In the context of part-time annotators especially, this person is responsible to explain how to use tools and troubleshooting issues. This is distinct from the Production Manager’s who are more geared towards training full-time annotators. This is because you train a Doctor differently than you train an entry level employee.

Training Data Production Manager(s)

This person is chiefly concerned with being a taskmaster for actually getting annotation work completed.

  1. Interfacing with Data Science to setup the Schemas, setting up the tasks and workflow UIs, doing the Admin management of training data tooling (generally non-technical), general annotation tooling expert.
  2. Training Annotators
  3. Managing Day to Day Annotation Processes
  4. In some cases basic data loading and unloading can also be done by this person.
  5. When in change discussion, this person is responsible to explain the reasonableness of annotation work to people new to the matter.
  6. Using Data Curation tooling.

Annotation Producer

Annotation producers usually fall within two buckets:

  1. Full time, dedicated and trained people. This may be newly hired people or re-assignment from existing work

2. Part time, it will increasingly become part of everyone’s job to a degree

Data Engineering

  1. Responsible for getting the data loaded and unloaded, the technical aspects of training data tools, pipelines setup, pre-labeling, etc.
  2. Especially organizing getting data from various sources including internal teams
  3. Planning and architecting setup for new data elements
  4. Organizing integrations to understand the technical nuances of capturing Annotations

Thanks for reading!

If you enjoyed this, consider pre-ordering the upcoming Training Data Book in early release.

  1. Are you looking for a training data platform? Also called data labeling platform or data annotation.
  2. Working at a large firm? Consider Enterprise Training Data Platform.
  3. Did you know? Diffgram is an Open Source Training Data Platform. That includes artificial intelligence image annotation, machine learning segmentation, and more.

Keywords: ai transformation, diffgram, data labeling, training data platform, machine learning image annotation, label box, labelbox, scale, scale ai, super annotate, data labeling software company

--

--