i2k Oilfield Places®—Named Entity Recognition for Oil & Gas, a Microservice within the i2k Connect® AI Platform

Keri Harvey, August 17, 2020

How do we define search and what does that mean?

noun: the systematic retrieval of information, or the facility of this*

noun: an area of land or seabed underlain by strata yielding petroleum, especially in amounts that justify commercial exploitation*

plural noun: particular positions or points in space*

noun: a thing with distinct and independent existence*

noun: acknowledgement of something’s existence, validity, or legality*

            *Definitions from Oxford Languages


You can do a search on oilfields and find over 7 million results from Google in seconds. If you refine your search with “oil field places,” the results are reduced but only by about half, still giving you over 3.5 million results to sift through. You try again with “oilfield basins” and the results are just under 1 million. Great, getting easier! Or is it?

We have not even touched on if you search for mentions of oilfield in files lost in your file shares, the cloud, or in the company silos. Don’t even go there, right?

What if you could search for a region, basin, field, or formation by narrowing down your search using filters similar to a large online shopping company that provides everything you need or want?

i2k Oilfield Places® is the i2k Connect industry knowledge base for identifying regions, basins, leasing areas, blocks, fields, formations, and wells. It is a named entity recognition service that comprises part of the i2k Connect AI Platform. The service recognizes and disambiguates references to places of interest and rejects references that are only coincidentally names of known places. It exposes a taxonomy that characterizes places from both a geopolitical and a geological perspective.

Why is this hard? Let’s start with the named entity recognition part of the problem. Fundamentally, oilfields have confusing names. A very large number are named after people and places; others are named after planets, mythical figures, colors, cardinal directions, and so on. The names we use for oilfields are very hard to distinguish by themselves from other sorts of names we use in either technical or everyday writing. To recognize references to places with high precision, it’s necessary to recognize them in the context of whole phrases in sentences or in forms and tables in oil & gas reports.

Consider the term “west.” It’s possible to train a system to recognize that “west” is a term that refers to an oilfield, but doing so would be dangerous. Our recall percentage would be good: We would detect any mention of the term “west” as a reference to the West Field. However, our precision would be very low, because we would assume that any reference to “west’ is a reference to the West Field. Instead, i2k Oilfield Places looks for references to places associated with compatible anchor terms. It won’t recognize “west” as a field reference, but it will recognize “West Field” (as well as a number of field-like synonyms such as “West Development” and “West Project”).

But there’s a more significant problem with “West Field.” Which one? Field names are often ambiguous, and West is a good example. There are two in Texas: one in the Permian Basin and one in the Fort Worth Basin. Given the sentence above, i2k Oilfield Places would not return a definitive entity for West Field, but it would return in its ambiguous term collection the term West Field along with the two entities that the West Field matches.

Now, suppose that we analyze this sentence instead: Oil production began in the Permian Basin’s West Field in 1970. This sentence is unambiguous; there is only one West Field in the Permian Basin, so i2k Oilfield Places will return the correct entity. In addition, it will return a reference to the Permian Basin in Texas and New Mexico:

North America > United States > New Mexico; Texas > Permian Basin
North America > United States > Texas > Permian Basin > West Field

Moreover, if the text had already mentioned the Permian Basin (and not the Fort Worth Basin), then i2k Oilfield Places would have enough information to pick the right West Field even from the first example above.

Finally, consider this sentence: Oil was produced from the Grant, Wilson, and Alameda Fields. There is a Grant Field in three countries, there is an Alameda Field in three different countries, and there is a Wilson Field in two countries (in Trinidad and Tobago, and in two U.S. states). But the only place that contains fields with all three of these names is Kansas U.S., and i2k Oilfield Places returns exactly those entities:

North America > United States > Kansas > Alameda Field
North America > United States > Kansas > Grant Field
North America > United States > Kansas > Wilson Field

Attached to each entity returned by i2k Oilfield Places is an explanation: the actual sentences that contained each recognized entity, and the justification for why terms in the sentence were matched to oilfield places. These explanations can help focus a reader on desired content within a large document and can also help knowledge curators understand and correct errors due to missing or incorrect knowledge.

i2k Oilfield Places entities follow a general hierarchy, from the most general to the most specific: Region (i.e., continents), Country, Country-Region (e.g., states, provinces, governorates, counties), Basin, Leasing Area (e.g., Green Canyon in the Gulf of Mexico), Block, Field, Formation, and Well.  The hierarchy serves two purposes: to tag documents so that they may be found through faceted search filters and to enrich a document with information not explicit in its text.

Faceted search allows us to retrieve documents associated with a place in the hierarchy.  For example, filtering on Kansas finds documents containing places in Kansas and its associated basins, fields, formations, and wells. The i2k Connect AI Platform can combine place filters with other automatically applied tags from its other enrichment engines, for example, to find all resistivity logs in the Central Kansas Uplift.

Tagging a document with an entity provides information not necessarily written into the document: “… Grant, Wilson, and Alameda Fields” implies Kansas without expressing it. i2k Oilfield Places makes these implications explicit and actionable.

Out of the box, i2k Oilfield Places knows almost 5,000 basins, 150 leasing areas, 500 blocks, 90,000 fields, and 9,000 wells.  In on-premise deployments, i2k Connect can also combine its curated knowledge base of publicly known Oilfield Places with a customer’s proprietary places.

We have defined search and what i2k Oilfield Places™ does. What does that mean for YOU?

If you are tired of wasting time searching for unstructured content or you and your team are unable to locate information by region, city, process, or problem, let us help you benefit from rapid identification and auto-classification of structured and unstructured content, which accelerates your ability to find and leverage the data you need now.

i2k Connect’s offering includes an i2k Envisioning Session, Proof of Concept, Pilot period and deployment of i2k Connect’s software tailored to your environment.

The i2k Connect AI Platform integrates into your existing workflows and produces valuable information and data from your unstructured content. i2k Oilfield Places can be deployed on-premise or in the cloud, enabling a variety of big data text analytics applications via a REST API, making the service easy to use from anywhere.

See i2k Connect’s Research Portal at work for the Society of Petroleum Engineers (SPE).

Contact us for more information.