The Case for Collaboration - Data Science Is Done Best When an Operator Works With a Data Scientist

August 18, 2020 | Patrick Bangert

In the past year, the number of presentations and papers submitted to SPE conferences and similar events in the oil and industry has sky-rocketed. Whereas 2–3 years ago we could hardly find any papers related to machine learning, now a conference might have 30%-40% of its papers directly related to empirical model making, deployment, and use cases. What strikes the casual observer is that these projects fall into three rough categories:

1. An operating company analyzes its own data internally and pilots its results with some success.

2. A data-science company or university analyzes some data without an operator, celebrates its use case, and looks for an application.

3. An operator teams up with a data-science company, creates models, and deploys them in the field.

use_cases Fig. 1— Some of the machine-learning use cases presented in the past year at oil and gas conferences.

By now, a sufficient number of papers exists in each of these categories to draw some conclusions. See Fig. 1 for an incomplete list of some of the use cases discussed. In my opinion, most studies conducted by an operator alone have yielded some benefit but have used either substandard or outdated machine-learning methods. As a result, money is being left on the table. It is also difficult to operationalize homemade models in environments such as R or Python to the standard control-system architecture of an operator.

Studies conducted by a data-science company alone tend to lack domain knowledge and, thus, emphasize points that are often moot or unrealistic. The data used to model the phenomena often is synthetic because the authors do not have access to real empirical data. It is difficult, therefore, to trust the conclusions, and the models will need to go through substantial reality checks before anyone can adopt them.

Studies conducted through collaboration between an operator that knows the physical reality and a data-science company that knows the best machine-learning methods yield good practical results.The difficulty is finding one company that possesses expertise in both domains. The third component in the mix is having the right software infrastructure to deploy the model in a control system so that it can actually deliver whatever benefit the theoretical analysis has found. While it is possible to develop custom infrastructure, it can be time-consuming and error-prone.

A case in point: Artificial lift is one of the main topics to come out of the recent rise of machine learning in oil and gas. Many papers can be found on OnePetro and elsewhere that analyze the dynamometer card of a sucker-rod pump. Almost all of these papers fall into the first two categories in which either a company has domain knowledge but limited data science expertise or a data science company has very little domain knowledge. Resulting model accuracies tend to be in the high 80% range, competitive with human beings. It follows that, apart from a handful of exceptions, these studies have remained offline studies and have not been operationalized anywhere despite the business case being made by many. One exception falls into the last category and is a study done by Tatweer Petroleum, the operator of the Bahrain oil field, who worked in collaboration with data scientists, achieved a model accuracy of 99.9%, and actually deployed it in the field while reaping concrete business benefits. This is documented in papers SPE 194949 and SPE 195295.

Empirical evidence shows collaboration is worthwhile in order to obtain good models, achieve good business results, and operationalize resulting models.


  • RSS
  • Samsung SDS
  • Samsung SDS LinkedIn
  • Samsung SDS Youtube
  • Samsung SDS Slideshare

Copyright ©2019 SAMSUNG SDS America, Inc. All rights reserved.