宙畑 Sorabatake

Machine Learning

The “Extracting Difference Between Two Points of Satellite Data” Challenge — A Look at ABEJA’s Difference Extracting Algorithm

We went behind the scenes to ask ABEJA about the applications and future of their difference extracting algorithm!

This issue of Sorabatake digs into the details behind the collaboration between Sakura Internet, which operates the satellite data platform “Tellus”, and ABEJA, a start-up that works with AI. We will be asking the manager of the project, ABEJA’s Ryusuke Sakuma, about the applications and future of their difference extracting algorithm!

Ryusuke Sakuma

Graduated from Keio University’s Faculty of Law’s Department of Law in 2002. After working for 5 years as the youngest executive for ABeam Consulting (then Deloitte Tohmatsu Consulting LLC), moved to ABEJA in 2019. Manages AI utilization projects and global expansion. Appointed Senior Executive Manager of the Case Study Department from December of the same year.

1. ABEJA, a Company that Implements AI Technology into Society

- Let’s start off with you introducing us to ABEJA’s business model.

ABEJA is a start-up that focuses on implementing deep learning oriented AI technology into society.

We have two main projects that we are working on, the development and distribution of a platform for creating and managing AI we call the “PaaS Business”, and packaged already established software together for distribution, which we call the “SaaS Business”.

The project I will be focusing on today is the PaaS business. Our PaaS business works to spread the development of AI models throughout the world, and we hope to not only teach people how to use the platform, but to have them experience how the AI can impact their business by conducting a wide variety of case studies.

2. Creating New Value by Combining Satellite Data & Machine Learning

- What factored into the start of this project?

We originally had a partnership we called the “xDataAlliance” with Sakura Internet and their Tellus project. They came to us about making an application on Tellus for businesses to use satellite data together, which led to the start of this project.

We felt like machine learning could really bring out the true potential of satellite data and were interested in seeing what we could do.

By using image recognition and image processing and comparing satellite data on two different points of time of the same area, we may be able to retrospectively analyze changes in the earth’s surface, or to cities and buildings. Furthermore, there is also talk about being able to potentially predict the future which makes us feel like this could be something that really impacts society and lead us to begin our difference extraction algorithm project.

3. Using Two Different Data Points of Satellite Data to Automatically Find the Difference

- What kind of project is behind the “Difference Extracting Algorithm”?

Our goal in this project was to make a model, or algorithm, that could be applied to a function or application in order to determine what kind of pictures and situations show and don’t show clear differences. This is what is called a feasibility study, which is the stage where we prove that something is feasible.
We’ve created a report of the results we got from testing what kind of changes happen in different situations.

The kind of pictures and situations can be seen below:
– Picture of an intercity area
– Picture of a suburb just outside of the city
– Picture of the countryside with a lot of nature
These are the three situations we worked with.

Two pictures taken at different times for each situation.

Intercity area Credit : JAXA
Suburbs (shopping mall) Credit : JAXA
Suburbs (highways) Credit : JAXA
Nature Credit : JAXA

We imagined there was already a variety of satellite data use cases available, so we ran on the assumption that there would be one for analyzing reconstruction trends for apartments. We kept this in mind for the automatic extraction of the three situations above.

We also wanted to look at change driven by a typical event. For example, choosing an image where a stadium may be under construction, and then finding that there is a new roof there. We wanted to include patterns like this that can be confirmed without technology, so we included two images to see whether or not our algorithm could pick it up.

Extracted difference from photo comparison of buildings.
Building differences in the suburbs (shopping mall) Credit : JAXA

We used databases and websites that contain chronological information on new apartments and other construction projects to research areas that had actually changed, and created a few patterns to look for so that we weren’t shooting in the dark.
We are currently discussing this technology’s potential to be used by insurance companies to ascertain damage caused by natural disasters in the future.


- Could you tell us a little more about the project, such as what the team was like or how long it took?

We wanted to quickly complete the feasibility study and move on to the stage of creating an application that could be used by the public, so we spent a little under two months on the study, which included finishing the report.

Our team was set up the way we always do to develop AI models, which consists of two to three people. This team was a tag-team between a project manager (Mr. Sakuma), who acts as a liaison for our client, and a data scientist (Mr. Pierre), who worked behind the scenes to create the model or algorithm.

Our data scientist, Pierre, in addition to being well versed in machine learning algorithms, he has experience with image processing and computer vision (CV), and is skilled at using Open CV and deep learning for categorizing pictures and detecting objects. We mainly used Python as our programming language.

PierreLe Meur

Completed his bachelor’s in applied mathematics at University of Rennes 1 (France) in 2019. Joined ABEJA as a data scientist after completing an internship.

When we first started the project, our company didn’t have much experience using satellite data, so we worked together with Sakura Internet and other specialists to find information on references and problems unique to satellite data.

Some of the problems unique to satellite data are things like clouds covering up the area you are looking at, or the angle of the satellite affecting the lighting for the image. These are things our company alone wouldn’t have been able to handle.

We combed through many different studies and references to try and see if we could copy their results, or compare them to our situations to see how they are similar, adopting approaches and gaining knowledge from each of them.

4. Difficulties Working with Satellite Data for the First Time

- What were some parts of the development behind this project that you found challenging?

The hardest part was definitely the data.

Using an algorithm to process an image was something we could already be due to an extent with our experience at the time.

Satellite data, on the other hand, was something completely new to us. Our projects don’t usually require us to take into account details such as how or where the image itself was taken. There were parts of it that were beyond our imagination.

We started sales of an SaaS for retail companies called “ABEJA Insight for Retail“, by which we processed images captured by video cameras and sensors in stores to analyze their customers’ attributes and useful data to help the stores measure the effectiveness of their strategies. Stores offer a situation that is both familiar and accessible, meaning that if we want clearer images of the customers’ faces, we can go to the store and make adjustments to the lighting, allowing us to use the process of elimination to create the best situation for gathering data. Not being able to control the conditions of which the images were taken presented a new challenge for us.

The concept of bands was also something we had never dealt with before. Certain areas appear stronger in the images due to bands, which makes foliage really stand out. There are lots of things we need to factor in due to the nature of satellite data.

We didn’t even know if or how the images had been processed before they came to us, which made it even more difficult to figure out what we could do with them.

A lot of it was very new and interesting to me, and I gradually learned more about the characteristics of satellite data.

- Are satellite data and machine learning compatible?

I found working with images was very challenging, but if we can figure out how to accurately depict characteristics of the images, then the sheer volume and variety of satellite data will prove to be very compatible with AI and machine learning.

Especially for machine learning, where there is great potential to use large amounts of data to come to conclusions that humans just can’t figure out on their own.

5. A Behind the Scenes Look at the Interesting and Difficult Parts of Development

- Was there anything unique to satellite data you found interesting, and were there any other points you found challenging?

(1) We had to make something that could be used by many people for a variety of objectives, but it was the freedom we had that also made it difficult to pinpoint the challenges we faced.
One of the interesting, but definitely hard parts, was figuring out what kind of differences we could extract, and how it would be useful to users.

This project wasn’t something we were hired to do by a specific company to use for a certain purpose, but was something that would be released to the public for a variety of different objectives. Thinking about what kind of problems could be solved, and by what kind of people, made it as difficult as it was interesting to determine the problems we needed to tackle.

Finding something that was both technologically feasible and useful was our biggest challenge, and it definitely wasn’t easy.


(2) Figuring out how to solve technical hurdles presented by the more pragmatic low-resolution satellite data
During our research we came across cutting-edge, high-quality, and expensive data, but taking into consideration that we wanted this to run on Tellus’ platform, we needed to keep the cost down by using the data provided by Tellus. So rather than using the satellite data used in the studies we referenced, we were challenged with finding out what we could do with lower quality resolution.


(3) Working with a myriad of variables stacked against us, we needed to figure out what we needed to do to reasonably overcome challenges
When extracting data on buildings, we found that the season affected changes in foliage around the buildings which worked against us. Looking at one spot, even if the buildings were the same, the tree leaves in the area would change from green to red.

When this happened, we needed to fix it case by case by either fixing it with the bands (by using ones that don’t pick up foliage) or changing the logic for the algorithm.

Whether it was a technical issue, or something we could easily fix if we knew more about satellite data, figuring out the most reasonable solution when there were many variables to consider was also challenging.

6. The Amazing Future and Business Impact that Lies Ahead of the Hardships

- Could you share your outlook on what you would like to make possible for making this into a business?
First, we would like to make a practical application or function that uses a difference extraction algorithm, and try using it in a lot of different areas. Then I would be interested in using it the same way we use AI for our society implementation business, where we make it our mission to change how people operate in society.

Our end goal would be that users can easily access this technology on their computers or phones, or maybe something smaller like smart glasses in the future, to easily look up themselves to find and create trends in society. We hope to use this technology to change the way things work in both a business and private sense.

For example, let’s say that there is a pattern in the reconstruction of apartment buildings. If we find out that there is an area where apartment buildings are being rebuilt every ten years, it allows us to make the assumption that it might be a good place to build an apartment building ten years later.

We can use data from past trends like this to make a product that leads to something actionable.

Data allows us to see opportunities to create a new business beyond the financial and real estate industry, and we hope to use this to develop an application that allows us to create business opportunity through behaviour modification.

Something we have had our eyes on for a while is combining this data with smart hospitals and smart cities to improve the quality of life for people. We think that would be an interesting business.

While satellite images can’t show us the movements of each and every person, knowing how the city has changed can be a great foundation for a multi-layered analysis on predicting things about people in general. This is something that stands to potentially become a business.

- We get the impression you were able to grasp the potential in using multiple data points over only satellite data to get a fuller image of trends.

That is exactly how we feel. By using macro-data from satellites and combining it with micro-data on how people move and spend money, we can use all of the data to its fullest potential. We are looking forward to finding demand for things we may not have been able to see before by looking at what is happening in cities and other economic circles.

By using such massive data, we are interested in uncovering trends that have yet to be discovered, or couldn’t be discovered, and anticipate that people will be inspired by our project to create their own product to find out these trends.

We would be really happy if we could make something genuinely useful to people by enhancing their user experience through collaboration with experts in this field.

Credit : Sakura Internet Inc.

- It kind of feels like you guys are starting with a B2B business model, focusing on areas like the financial and real estate sectors, and hoping it spreads into the B2C users as well.

That is exactly right. That is close to Google Map’s approach, actually. Before it was Google Map, the company was called Keyhole. Keyhole originally started by making models of the earth using image data to sell as a B2B product, but eventually became available to the general public, such as you and I. I think it would be very exciting to create a product that follows a similar path as that.

- Does ABEJA and Sakura Internet have a concrete plan for moving forward?

We initially considered trying to use deep learning, which is ABEJA’s area of expertise, to run large quantities of image data through our AI to pick out the differences.

However, given the limited amount of time we had, it would have been difficult to prepare the data for this, and we would have needed to put labels on the buildings to show whether or not there were buildings in each picture. We didn’t have a concrete image of what could be considered a difference in the beginning, so it wasn’t realistic for us to have our machine try and learn from thousands, tens of thousands of pictures of buildings to try and learn. This is something we will have to overcome eventually.

That being said, near the end of last year, data taken by another satellite called Sentinel with a similar resolution was released, part of which may be applicable to a deep learning approach we felt wasn’t feasible for this project. This is a very new development that may change the game.

A data set that includes a whopping 300,000 images.
So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones Classification

With our feasibility study being the starting line for this project, there is still a lot that we need to do. Our next phase will involve really proving that the concept works, then we will make a bare-minimum application that we can use. It’s important to know where we are.

If we are able to accomplish that, we can team up with promising clients and slowly test our product to find out new ways that it can be used, even if just by “think tank” ideas to see what we get.

- Please give some advice to new data scientists and businesses that want to start using satellite data!

To be honest, satellites are something that are kind of alien to us, so I think it is hard for a lot of people to get into them.

However, this is something that anyone can access, so it won’t take long for someone who really knows how to use data to figure out how to use it in a productive way.

I feel like for satellite data, the bar is set high, but in a good way that has kept the market from being saturated, which means there is a lot of room to make a breakthrough. I want to let everyone know that taking a chance at making a breakthrough is very interesting! ★

Even if you aren’t familiar with satellite data, you can take it in little by little.  ABEJA was fortunate enough to receive help from specialists on this.

Figuring out what you would like to use the data for without preconceptions is really important for innovation. This isn’t a field that will bear fruit soon, so it will likely test your creativity and drive to pursue new knowledge.

7. Editor's Note

We went behind the scenes to ask about the applications and future of difference extracting algorithms!
This project excites us not only because it used satellite data but shows how combining data creates new value and seems to bring ABEJA’s tagline to”Implement AI into society” one step closer to reality.
We also feel like this holds the potential to be at the core of the platform that supports the idea of a super city that has caught on in recent years, with the establishment of the national strategic special zones law.

Credit : Cabinet Office Source : https://www.kantei.go.jp/jp/singi/tiiki/kokusentoc/supercity/supercity.pdf

The bar is set high, but in a good way that has kept the market from being saturated. With the satellite data market being essentially up for grabs, please consider trying to use the data for your business.

Meanwhile, Sorabatake will continue to bring updates on different extraction algorithms!