Poorly configured Apache Airflow platforms threaten organizations
Many organizations using the popular open-source Apache Airflow platform to plan and manage workflows can expose credentials and other sensitive data to the Internet because of the way they use technology, researchers have found. researchers.
Security provider Intezer said this week that it recently discovered several misconfigured Airflow instances exposing sensitive information belonging to organizations across multiple industries, including manufacturing, media, financial services, information technology, biotechnology and health.
The data exposed included user credentials for cloud hosting services, payment processors, and social media platforms, including Slack, AWS, and PayPal. Intezer discovered that at least some of the data exposed through misconfigured Airflow instances could allow malicious actors to access corporate networks or execute malicious code and malware in production environments and on Apache Airflow. himself.
“It’s pretty easy to find exposed instances,” says Ryan Robinson, security researcher at Intezer. To locate one, all a malicious actor needs to do is scan the IP addresses and search them for the expected HTML file. “Finding sensitive information about exposed instances is trivial, but harnessing it to run code is much more difficult and requires a solid understanding of each platform,” adds Robinson.
Organizations use Apache Airflow to create and schedule automated workflows, including those related to external services, such as AWS, Google Cloud Platform, Microsoft Azure, Hadoop, Spark, and other Apache software. A investigation of its use in 2020 has shown that most of its users are data engineers, scientists or data analysts from medium and large companies. More than three-quarters of organizations hardly personalize technology before using it.
Airflow allows users to orchestrate tasks involving multiple tasks, Robinson explains. For example, he says, a job might involve generating reports and then emailing them to clients; another task might involve collecting, processing, and uploading data to AWS buckets.
While Airflow gives users several options to use it securely, organizations can put data at risk by the way they use the platform.
Intezer, for example, found that insecure coding practices were the most common cause of credential leaks in Airflow. Intezer’s research Discovered several Airflow instances in which passwords had been hard-coded either into Python code to orchestrate tasks, or into a feature that allows a user to set a variable value. In other cases, Intezer discovered that users were misusing an Airflow feature called Connections and storing passwords in the clear instead of encrypting them.
“Airflow offers good options for storing sensitive information securely through its connections feature,” says Robinson. This feature enables organizations to ensure that passwords used to send and retrieve data from other systems are stored encrypted. “For example, a task will download data from a platform using an API key, then process that data in another task and store that data in a database using a password to log in. A workflow may need to interact with multiple remote systems, ”said Robinson. said. Users often abuse the Connections feature or directly hard-code credentials into Python scripts, bypassing the feature entirely, he notes.
Intezer has found other ways in which users can put company data at risk through unsafe use of airflows. An example is settings related to an Airflow configuration file that often contains sensitive information, such as passwords and keys. If the setting is not secure, anyone can access the configuration file from the web server’s user interface, Intezer said in its report. Likewise, a feature in older versions of Airflow that allows users to run ad hoc database queries is dangerous because it requires no authentication and allows anyone with access to the server to get information from the server. database.
Intezer recommends that all organizations using Apache Airflow update to the latest version 2.0.0 of the platform and ensure that only authorized users are allowed to connect to it.
“Version 2.0.0 has greatly improved security,” says Robinson. The new version has a fully supported API, unlike the experimental API of previous versions. Other major improvements include enforcing authentication and removing sensitive information from logs, as well as changes to the structure of the main configuration file, he says. Some older (and dangerous) features like Ad-Hoc Query have been deprecated in the new version of Airflow.
Robinson says it’s difficult to know for sure whether attackers are targeting Airflow platforms configured in an insecure manner; However, he says it would be a reasonable assumption that Airflow instances were targeted.