Analyzing Office 365 Photo EXIF data using Power BI

In an earlier post, I wrote about how to mine the OCR data from Office 365 document folders using Power BI. During the research for that post I discovered that the photo EXIF data is also available for analysis from Office 365. I’ll show you how to get to it in this post.

What is EXIF data anyway?

Think of EXIF as special metadata for photos and with today’s cameras, records the settings of the camera as well as the location it was taken, if the camera has GPS. You can read more about the format here:  Many smart phones today automatically include the EXIF data when you take photos with your phone. This data is automatically extracted by Office 365 from photos when uploaded to a document library.

Why do I care about this?

Imagine you work for a company that does inspections. Your inspectors use their phone to take photos of issues already. Wouldn’t it be great to show the photos by location and related location data together on a report page? This technique allows you to easily mine that EXIF data.

How do I get to it?

First, upload your images to a SharePoint document folder in Office 365 and the OCR process will initiate  automatically. I’ve had it take up to 15 minutes to process so you may need to be patient. You can do this via SharePoint Mobile app if you are uploading mobile photos.

Second, from Power BI desktop, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the correct column that contains the EXIF data. Once in the dataset, you can use Power Query M to parse the data and start analyzing.


In this video, I’ll show you how to access the EXIF data and what you can do with the data.

Here’s the Power Query M Code

Source = SharePoint.Tables(“https://YourInstance/YourSite”, [ApiVersion = 15]),
#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″ = Source{[Id=”88888888-8888-8888-8888-888888888888″]}[Items],
#”Renamed Columns” = Table.RenameColumns(#”8d19b1eb-42b8-4843-b721-fc1e8ef47688″,{{“ID”, “ID.1”}}),
#”Filtered Rows” = Table.SelectRows(#”Renamed Columns”, each [MediaServiceAutoTags] <> null and [Created] >= #datetime(2019, 4, 19, 0, 0, 0)),
#”Expanded FirstUniqueAncestorSecurableObject” = Table.ExpandRecordColumn(#”Filtered Rows”, “FirstUniqueAncestorSecurableObject”, {“Url”}, {“FirstUniqueAncestorSecurableObject.Url”}),
#”Removed Other Columns” = Table.SelectColumns(#”Expanded FirstUniqueAncestorSecurableObject”,{“FirstUniqueAncestorSecurableObject.Url”, “File”, “Properties”}),
#”Expanded File” = Table.ExpandRecordColumn(#”Removed Other Columns”, “File”, {“Name”, “ServerRelativeUrl”, “TimeCreated”, “TimeLastModified”}, {“File.Name”, “File.ServerRelativeUrl”, “File.TimeCreated”, “File.TimeLastModified”}),
#”Merged Columns” = Table.CombineColumns(#”Expanded File”,{“FirstUniqueAncestorSecurableObject.Url”, “File.ServerRelativeUrl”},Combiner.CombineTextByDelimiter(“”, QuoteStyle.None),”Picture URL”),
#”Expanded Properties” = Table.ExpandRecordColumn(#”Merged Columns”, “Properties”, {“vti_mediaservicemetadata”}, {“Properties.vti_mediaservicemetadata”}),
#”Parsed JSON” = Table.TransformColumns(#”Expanded Properties”,{{“Properties.vti_mediaservicemetadata”, Json.Document}}),
#”Expanded Properties.vti_mediaservicemetadata” = Table.ExpandRecordColumn(#”Parsed JSON”, “Properties.vti_mediaservicemetadata”, {“address”, “location”}, {“Properties.vti_mediaservicemetadata.address”, “Properties.vti_mediaservicemetadata.location”}),
#”Expanded Properties.vti_mediaservicemetadata.location” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata”, “Properties.vti_mediaservicemetadata.location”, {“altitude”, “latitude”, “longitude”}, {“Properties.vti_mediaservicemetadata.location.altitude”, “Properties.vti_mediaservicemetadata.location.latitude”, “Properties.vti_mediaservicemetadata.location.longitude”}),
#”Expanded Properties.vti_mediaservicemetadata.address” = Table.ExpandRecordColumn(#”Expanded Properties.vti_mediaservicemetadata.location”, “Properties.vti_mediaservicemetadata.address”, {“City”, “State”, “Country”, “EnglishAddress”}, {“Properties.vti_mediaservicemetadata.address.City”, “Properties.vti_mediaservicemetadata.address.State”, “Properties.vti_mediaservicemetadata.address.Country”, “Properties.vti_mediaservicemetadata.address.EnglishAddress”}),
#”Changed Type” = Table.TransformColumnTypes(#”Expanded Properties.vti_mediaservicemetadata.address”,{{“File.Name”, type text}, {“Picture URL”, type text}, {“File.TimeCreated”, type datetime}, {“File.TimeLastModified”, type datetime}, {“Properties.vti_mediaservicemetadata.address.City”, type text}, {“Properties.vti_mediaservicemetadata.address.State”, type text}, {“Properties.vti_mediaservicemetadata.address.Country”, type text}, {“Properties.vti_mediaservicemetadata.address.EnglishAddress”, type text}, {“Properties.vti_mediaservicemetadata.location.altitude”, type number}, {“Properties.vti_mediaservicemetadata.location.latitude”, type number}, {“Properties.vti_mediaservicemetadata.location.longitude”, type number}})
#”Changed Type”

Analyzing Office 365 OCR Data using Power BI

I’m so excited to see Optical Character Recognition (OCR) working natively now in Office 365! I got my start in government where we scanned a lot of documents. There was a lot of data locked away in those images but no easy way to mine it. OCR was explored as a possible solution, but it was still in its infancy and not very accurate or scalable.

Fast forward to today and now OCR is simply part of our cloud infrastructure and with the assistance of Machine Learning, very accurate. The dream of finally unlocking that data is here! Or is it? Read on to find out more.

By the way, we’ll be covering this and more in our session at The SharePoint Conference, May 21-23 in Las Vegas! Use discount code GATTE to get a discount.

Intelligent Search and Unlocking Your Image Data

SharePoint’s ability to do native OCR processes was first shown at Ignite 2017. There, Naomi Moneypenny did a presentation on Personalized intelligent search across Microsoft 365, where you saw Office 365 OCR in action.

She uploaded an image of a receipt and was able to search for it, based on the contents of the receipt image. It was a seamless demo of how Office 365 can intelligently mine the data in image files.

Now, let’s see if we can access the OCR data across multiple files and do something with it.

In the following procedure, I’ll show you how to connect to Office 365 and get to the data that the OCR process returns. In the following post, I’ll show you how to process receipt data to get the total amount spent, solely from the OCR data.

Process Overview

There are three steps to the process to start mining your OCR data. First, you have to add image content that contains text to a SharePoint folder.

The process of getting OCR data

Finding OCR Data

The OCR process that runs against the files in a SharePoint document folder are called Media Services. All derived data is stored in columns that contain Media Services in them.

Unfortunately, I’ve discovered that this feature has not been implemented consistently across the Shared Documents folder, custom folders and OneDrive. There is good news in that there’s a less obvious way to get to the data consistently across all three, using Properties. As shown below, you see the normal column names and where they appear. Only the ones in Properties appear consistently across all. We are only going to cover the basic information but the Properties collection has a lot more data in which to consume.

Audit of which media service fields are available where in Office 365

Adding Image Content to a SharePoint Document Folder

When you upload an image to a SharePoint document folder in Office 365, the OCR process kicks off automatically. I’ve had it take up to 15 minutes but the OCR process will analyze the image for text and return the text in a field called MediaServiceOCR if present and always in Properties.vti_mediaserviceocr.

These columns contain any text that was recognized in the graphics file. The current structure of the returned data is a bit different that what is in the source image. Each instance of the discovered text is returned on a separate line, using a Line Feed character as a delimiter. For example, if you had a two-column table of Term and Meaning, it would return the data like this:





Original uploaded image
Data returned by media services

While it’s great you can get to the data, the current returned format makes it exceptionally complex to reconstitute the context of the data. Also, the more complex your layout, the more “interesting” your transformations may need to be. I’d strongly recommend this post ( and this post ( ) to give you the basics of text parsing in Power Query M.

Accessing the OCR Data in Power BI

The OCR columns are item level columns. The normal tendency would be to connect to your SharePoint site using the Power BI SharePoint Folder connector. You’ll be disappointed to find that the Media Services columns aren’t there.

Instead, connect to the document folder using the SharePoint Online List connector. By doing so, you’ll get access to the Media Services columns. Once in the dataset, you can use Power Query M to parse the data and start analyzing.


Let’s walk through how to access the data and manipulate it using Power BI. In this scenario, I have two receipts that have been uploaded in a document folder and I’m going to get the total spent on these receipts by analyzing the OCR data.

What about OneDrive for Business?

Yes, it works there too! The Media Service property fields are here as well. In fact, you get more information in an additional column called MediaServicesLocation. Based on my usage, it seems to be specifically populated for image files. If the image contains EXIF data, the MediaServicesLocation will contain the Country, State/Province, and City information of where it was created. Within the Properties collection, you can actually get more detailed information about the photo, like the type of camera that took it and more.

To connect to OneDrive where this will work, you need your OneDrive URL. I normally right-click on the OneDrive base folder in File Explorer and select View Online, as shown below.

Select View Online to get to the OneDrive url needed for Power BI connection

Potential for GDPR Issues

One aspect to consider if you look to do this is a production manner in Europe is that you will likely encounter information that falls under GDPR regulation. Consider this your prompt to think about how this capability would fit into your overall GDPR strategy.

Want a copy of the Power BI model?

Fill out the form below and it will emailed to you automatically, by virtue of the magic of Office Forms, Flow, and Exchange!

I hope you liked this post. If you have comments or questions, post them below in the comments.

The True Cost of the Cubicle Hero

Heroes. Society loves them, honors them and exults them. Corporate offices are filled with a new breed of hero, the Cubicle Hero. These are the people who go beyond the norm and figure it out. They burn the midnight oil and they get it done. They overcome the chaos and reach the goal. All hail the hero!

However, heroes tend to overstay their welcome. In the movie, “The Dark Knight Rises”, character Harvey Dent intones, “You either die the hero or you live long enough to see yourself become the villain.” The Cubicle Hero’s individual victory is celebrated initially, but situations change and the need for the hero diminishes over time. Or so we hope.

Cubicle heroes can become process bottlenecks and productivity killers. Why? The organization’s reward structure doesn’t lead them to being mentors. The cubicle hero has great value to the organization but their way of working can’t scale and the lack of information sharing prevents the organization from truly benefiting from their victory. The hero then gets involved in every project that touches their area and becomes the bottleneck as the demand for their time is greater than what is available. Thus, the hero slowly becomes the villain, delaying projects.

Many years ago, I worked at a company where a core process of the company was dependent on a very skilled hero. He was a great employee and did his job earnestly. However, he also guarded his knowledge so that he was the only one who understood it completely. This became a serious company concern when he was involved in an accident, leaving him unable to work for several months. Several key projects were impacted.

Changing the perspective, expectations and language of what happens as part of these efforts can lead to a different outcome. We need to make it clear that we want and need Corporate Explorers rather than Cubicle Heroes. Leif Erickson, the Viking, may have been the first to reach North America on a heroic journey, but it was the explorer, Columbus, that opened up North America to the world.

Explorers and Heroes share many common traits. They can see the big picture. They can dig down into the details when needed. They put in the extra effort to get the job done. The real difference is in the aftermath. Explorers open new trails so that others may come behind them. Explorers become guides to help others make the same journey. Heroes, on the other hand, continue to hold onto their conquest.

Changing your company culture to encourage Explorers over Heroes creates a scalable culture of knowledge sharing. This organizational approach leads to greater productivity, higher quality collaboration and timelier project progress.

To summarize, I recommend reviewing the following in your organization.

  • Provide a clear path to success for as many as possible to the rewards for exceptional effort, in a way that others and ultimately the organization can leverage
  • Provide public recognition for knowledge sharing
  • Structure rewards, within the process, so we can move from the mentality of one time hero-creation to our true goal of constant productivity improvement
  • Provide the Explorer with opportunities to help facilitate and implement their achievement within the organization. This keeps the Explorer engaged and looking for additional ways to improve
  • Provide collaborative tools like Office 365 and Yammer to help facilitate and support the Explorer’s journey

If you are ready to address more productivity issues in your organization, talk to us or join our Community.