Maciej Mensfeld – Mend

Next-Gen Vulnerability Assessment: AWS Bedrock Claude in CVE Data Classification

Maciej Mensfeld — Tue, 30 Jul 2024 13:50:53 +0000

Large language models are fascinating tools for cybersecurity. They can analyze large quantities of text and are excellent for data extraction. One application is researching and analyzing vulnerability data, specifically Common Vulnerabilities and Exposures (CVE) information. As an application security company with roots in open source software vulnerability detection and remediation, the research team at Mend.io found this a particularly relevant area of exploration.

Introduction

In recent years, we’ve seen a significant shift in how we approach and handle cybersecurity threats. With the emergence of advanced AI and LLMs, a new space has opened for us to explore and enhance our understanding of vulnerability data. But here’s the catch: this data is only sometimes ready to use off the shelf. It often requires some elbow grease, manual cleanup, and alignment to make it valuable. That’s where the power of LLMs comes into play.

You see, CVE data is predominantly text-based. Its descriptions, reports, and details are all written down so humans can read and understand. But when dealing with thousands of records, reading through each isn’t just impractical; it’s impossible. That’s the beauty of using LLMs in this context. These models are not just good at understanding and generating text—they’re fantastic at sifting through vast amounts to find relevant details, patterns, and insights.

Best of all, you don’t need to be an AI expert to understand how this works. Whether you’re a seasoned cybersecurity professional, a budding data scientist, or just someone with a keen interest in the field, the advances in LLMs and their application in CVE data analysis are something to be excited about. So, let’s dive in and explore how these technological marvels are changing the game in cybersecurity research and what that means for the future of digital safety.

Overview of CVE data classification

CVE Data Classification is a process in cybersecurity where Common Vulnerabilities and Exposures (CVEs) are categorized and analyzed for better understanding, management, and mitigation.

Each CVE entry contains a unique identifier, a standard description, and a reference to publicly known information about the vulnerability.

The need for classification

Imagine a library with thousands of books randomly scattered around. Finding a specific book in this chaos would be a nightmare.

Similarly, with thousands of CVEs reported each year, the cybersecurity community needs a systematic way to sort through them. Classification helps by organizing these vulnerabilities into categories based on severity, affected software, and attack type.

When dealing with CVEs, it’s essential to recognize that the initial reports of these vulnerabilities are often not curated or well-written. The contributors of these reports range from developers and system administrators to end-users, many of whom are not security experts or professional writers. Their primary goal is to flag an issue, not necessarily to provide a polished, comprehensive analysis. This results in significant variation in reports, some terse and cryptic, others verbose but lacking clarity or structure.

Understanding the complexity of CVE reports

This diversity and the often unrefined nature of CVE reports present a challenge in extracting critical information. These complex narratives can bury crucial details, such as the type of vulnerability, the affected software or hardware, potential impact, attack requirements, and suggested mitigation steps. Navigating this information maze requires a keen eye and a deep understanding of what to look for.

The trouble with unstructured data

The inconsistency and sometimes poor quality of these reports make it difficult for automated systems to parse and understand the text accurately. When the input data is unclear, incomplete, or inconsistent, the risk of misunderstandings or oversights increases, which can have severe implications.

Below is an example of a CVE with its description, severity, and other information.

Human intervention often becomes necessary to review, interpret, and refine raw vulnerability reports. Security experts are crucial in translating these initial submissions into actionable insights. However, LLMs can also be invaluable in this context. With their advanced natural language capabilities, they can sift through the text to identify and extract key information from poorly written reports. Let’s take a look.

The challenge

Out of nearly 70,000 selected vulnerabilities and their descriptions, our mission was to isolate and identify those with specific Attack Requirements details in their descriptions. Before we dive into the task itself, it is good to understand what Attack Requirements are in the context of CVEs.

Understanding CVE attack requirements

Attack Requirements (AT) is a crucial metric in Common Vulnerabilities and Exposures (CVEs). It delves into the prerequisite conditions necessary for an exploit to be successful. These conditions could include specific system configurations, user actions, or any other state the vulnerable component must be in for the attack to occur. Understanding these requirements is vital, because it helps security teams assess risk and develop mitigation mechanisms. It’s about knowing how an attack happens and what specific circumstances can occur.

To better understand the AT metric, let’s look at the well-known Spring4Shell (CVE-2022-22965) description:

“A Spring MVC or Spring WebFlux application running on JDK 9+ may be vulnerable to remote code execution (RCE) via data binding. The specific exploit requires the application to run on Tomcat as a WAR deployment. If the application is deployed as a Spring Boot executable jar, i.e. the default, it is not vulnerable to the exploit. However, the nature of the vulnerability is more general, and there may be other ways to exploit it.”

We can see that for the environment using that package to be exploitable, it has to also run on JDK 9+. So, in that case, the AT will be set to ‘Present’.

Variability in CVE descriptions

CVE descriptions are the textual narratives that come with each reported vulnerability. They average around 43 words but display a significant range in length and detail. Some are concise, offering just a glimpse of the issue, while others provide an in-depth analysis. These descriptions are contributed by a diverse global community, leading to a wide spectrum of quality and clarity. This variability adds a layer of complexity to the task of accurately identifying and extracting attack requirement details.

Below, you can find some basic statistics about the descriptions of the CVEs we have analyzed:

Metric	Value
Total words	2,936,951
Unique words	170,216
Average words	42.85
Stdev in words	30.76
P95 in words	94
P99 in words	164
Total characters	20,051,586
Stdev in characters	206.7
P95 in characters	642
P99 in characters	1,143

The statistical analysis of those CVE descriptions, showing high variability and diverse content complexity, strongly supports using an LLM for data classification. The significant range in description lengths and the depth of content underscores the challenge of manual analysis and the necessity for sophisticated solutions.

Our goal and tolerance for errors

We aimed to filter through these 70,000 vulnerabilities and accurately identify those with specific details about Attack Requirements. In pursuing this goal, we recognize that there will be false positives—instances where the system incorrectly flags a CVE containing Attack Requirements details. While not ideal, these false positives are tolerable to an extent. They ensure we cast a wide enough net to capture all relevant data. However, what we strive to avoid at all costs are false negatives. These occur when a CVE contains details of attack requirements, but the system fails to identify them. Missing these crucial details is not an option, as it could potentially leave systems vulnerable to attacks that could have been prevented.

Model selection: Claude v2.1 vs. GPT-4

When selecting a suitable LLM for classifying vulnerabilities, we faced a critical decision: choosing between Claude v2.1 and GPT-4. This choice wasn’t just about picking a tool but about aligning our objectives with these advanced technologies’ capabilities, integration, and support systems.

Outcome quality. Our initial experiment involved analyzing a sample of 100 vulnerabilities, focusing specifically on understanding attack vectors. After fine-tuning the prompts for both models, we reached similar outcomes from each. It’s worth noting that our team was initially unfamiliar with Claude, and a learning curve was involved.

False positives and suitability. Both models required prompt tuning to reduce false positives, but Claude v2.1 emerged as the better-suited option for our specific needs. The tag-based model of Claude, which allows for the recognition of XML tags within prompts, provided a structured way to organize and refine our queries. This significantly enhanced our ability to delineate different subsections of a prompt, leading to more precise and valuable outcomes.

Claud prompt with XML tags:

 
        You are designed to help analyze cves entries for calculating cvss score. Users should share a link to CVE entry in NVD, you should check the link and all the attached references and provide a cvss 4.0 score and vector..
    
    
        
            Analyze CVE Description
            Read the description, and try to extract all relevant vulnerable components and understand the vulnerability impact.
        
        
            Examine attached references
            Analyze all the references attached to the cve enrtry, try to find more relevant information that will help you asses the cvss vectore and score.
        
    
    
        At the conclusion of these steps, you should provide an estimation of the cvss score and vectore of the supplied CVE.
    
    
        Refer to uploaded documents as 'knowledge source'. Adhere strictly to facts provided, avoiding speculation. Favor documented information before using baseline knowledge or external sources. If no answer is found within the documents, state this explicitly.

GPT Prompt:

You are designed to help analyze cves entries for calculating cvss score. Users should share a link to CVE entry in NVD, you should check the link and all the attached references and provide a cvss 4.0 score and vector.

Upon receiving the information, you should take a structured analysis comprising two critical steps:

1. **Analyze CVE Description:** Read the description, and try to extract all relevant vulnerable components and understand the vulnerability impact.

2. **Examine attached references:** Analyze all the references attached to the CVE entry, try to find more relevant information that will help you assess the cvss vector and score.

At the conclusion of these steps, you should provide an estimation of the cvss score and vector of the supplied CVE.

You have files uploaded as knowledge to pull from. Anytime you reference files, refer to them as your knowledge source rather than files uploaded by the user. You should adhere to the facts in the provided materials. Avoid speculations or information not contained in the documents. Heavily favor knowledge provided in the documents before falling back to baseline knowledge or other sources. If searching the documents didn’t yield any answer, just say that.

Data privacy and security. While data privacy and security are always critical, their importance multiplies manifold when dealing with customer data, making AWS’s proven security infrastructure a significant factor in our decision-making process.

Support and integration. Throughout the initial phases of our research, the support we received from AWS was instrumental. Not only did it improve our learning and adaptation process, but it also enhanced our overall experience with Claude v2.1. Furthermore, our existing infrastructure heavily relies on AWS services, making integrating Claude Bedrock a logical step.

Programmatic access

For my analysis, I utilized the Ruby AWS Bedrock SDK, which offers a straightforward and user-friendly interface. Beyond the initial credentialing set up, the primary step involves leveraging the #invoke_model method. This method enables you to execute your prompt and gather the results:

client = Aws::BedrockRuntime::Client.new(
  region: 'us-east-1',
  credentials: Aws::Credentials.new(
    '',
    '',
    ''
  )
)
resp = client.invoke_model(
  body: {
    prompt: "Who are you?",
    "max_tokens_to_sample": 100,
    "temperature": 0.5,
    "top_k": 250,
    "top_p": 0.999,
    "stop_sequences":["\n\nHuman:"],
    "anthropic_version":"bedrock-2023-05-31"
  }.to_json,
  model_id: "anthropic.claude-v2:1",
  content_type: "application/json",
  accept: "*/*"
)
response = resp.body.read
puts response

Crafting the prompt

The prompt we used with Claude v2.1 for analyzing CVE descriptions, specifically focusing on “Attack Requirements” (AT) as introduced in CVSS v4.0, is a crucial piece of our intellectual property. Due to its detailed and customized nature, tailored to our specific system needs, we’ve chosen not to disclose its content now.

What we can say is that it was crafted with structured XML tags and rich contextual information; the prompt differentiated between “Attack Complexity” (AC) and AT, including examples and hypothetical scenarios to enhance the model’s understanding of attack prerequisites. This careful design ensured the model’s analysis was precise and grounded in practical applications, making it a crucial asset in our CVE data classification efforts.

Regarding differences between prompts designed for Claude and those intended for GPT models, the primary one lies in the structural and contextual specificity required by Claude, especially when utilizing AWS Bedrock’s capabilities. Claude prompts often necessitate a more detailed and structured format, leveraging XML tags to guide the model’s focus and improve response accuracy. As highlighted in the preceding section, AWS played an important role in assisting us with crafting and fine-tuning this prompt.

Challenges encountered

We encountered a few notable challenges during this project. Below, you can find some details on each of the obstacles we had to deal with.

Quota limitations. When attempting to process approximately 70,000 vulnerabilities, each requiring between 2 to 10 seconds to complete, it became clear it would take around four days to complete. The initial strategy to expedite this process involved executing requests in parallel to save time. However, while AWS publishes its model quota limits (https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html), providing a theoretical understanding of capacity, the practical implications of these limits become apparent only through direct application. Our parallel processing approach quickly led to the depletion of our quota allocation, resulting in AWS quota exceeded errors. This real-world experience underscored the challenge of estimating the impact of quota limits, demonstrating that even a single-threaded approach, with cautious pacing, would occasionally hit the daily limits imposed by AWS, leading to unexpected halts in our data processing efforts. Despite these challenges, the situation was manageable in this particular case. However, it raises critical considerations for the high-scale usage of Bedrock in similar contexts.

Cost estimation. The initial estimated budget for this project was approximately $400, based on preliminary data and expected model interactions. However, the project’s final cost escalated to around $1,600. This discrepancy arose primarily due to underestimating the complexity and length of CVE descriptions and details, resulting in more detailed responses than anticipated. Moreover, the initial data sample used for cost prediction was not fully representative of the broader data set, leading to inaccuracies in our budget forecast.

Response characteristics. We expected the model to provide YES/NO answers to our prompts, which were necessary for further data curation steps. However, we often received responses with additional justifications or explanations beyond our anticipated straightforward answers. While these expanded responses provided deeper insights, they also introduced unexpected complexity into our data processing workflow. This characteristic highlighted the need to refine our prompts further to align more closely with our desired response format, optimizing the model’s output for our specific use case.

Results analysis

In our task, Claude v2.1 did a really good job. Ignoring the quota limits for a moment, we sent out 68,378 requests, and almost all of them came back with the YES/NO answers we were looking for. Only eight requests didn’t give us the straightforward answers we expected. That’s a success rate of 99.9883%, which is impressive. Even for those eight times when we didn’t get a simple YES or NO, Claude still provided enough info for us to figure out the answer.

Character count of the prompt (without CVE specific details)	13,935
Number of tokens for the prompt (without CVE specific details)	2,733
Total requests	68,378
Unexpected answers	8
Failures (quota limitations excluded)	0
Answer quality success rate	99.9883%

While the high success rate with Claude v2.1 is undoubtedly impressive, another aspect we shouldn’t overlook is when Claude provided us with additional information beyond what we specifically asked for. Overall, this didn’t cause significant issues on our end, since the YES/NO answers we needed were identified correctly. However, this extra information impacted our analysis’s overall cost and speed.

To put it into perspective, out of 68,370 valid responses, 11,636 included more information than requested. This accounts for about 17% of all responses. At first glance, 17% might not seem like a huge number, but it’s important to note the difference in response length. We expected responses to be around 23 or 24 characters long, just enough for a YES or NO answer wrapped with a tag. However, the average length for these more detailed responses was 477.4 characters, with the longest being 607 characters. This means our anticipated response size was only about 5% of the average length for these more detailed replies, highlighting a significant discrepancy contributing to increased processing time and cost.

Adding another layer to this analysis, when we summed the total number of characters for the straightforward short responses, it amounted to 1,343,688 characters. In stark contrast, the number of characters for the long responses was 5,554,849. This data reveals that while the 17% of responses with additional information only make up a minor fraction of the total count, they accounted for over 80% of the total characters present in all responses, which is a significant number. This disproportion further underscores the impact of the longer responses on the overall analysis—dramatically increasing the volume of data processed and, consequently, the associated costs, especially since around 75% of the total cost of using Bedrock Claude comes from the output tokens. This statistic vividly illustrates how a relatively small proportion of responses can disproportionately contribute to the workload and expense of handling and analyzing the data.

Total requests	68,378
Requests with extensive answers	11,636
% of requests with extensive answers	17%
Total number of characters	6,898,537
Total number of characters for short answers	5,554,849
Total number of characters for long answers	1,343,688
% of characters in short answers	19.5%
% of characters in long answers	80.5%

The additional details observed in 17% of responses likely result from the model’s inherent design to generate thorough, contextually rich text. These models, including both Claude and GPT-4, aim to deliver the most relevant and helpful information based on the vast data sets they’ve been trained on. When provided with a prompt, they often produce more comprehensive responses than strictly requested, especially if the prompt’s phrasing leaves room for interpretation. This behavior reflects the models’ training to prioritize completeness and clarity in their outputs, attempting to cover any potential ambiguities in the query. It’s important to note that this tendency is not unique to Claude; similar behavior has been observed with GPT-4, indicating a broader pattern among LLMs to provide extensive information to be as helpful as possible. This underlines the challenge of fine-tuning prompts across different models to elicit the desired level of specificity in responses.

While we guided the model to restrain its responses to the essentials by explicitly stating the need for brief, binary answers, it was not enough. We might have also leveraged Bedrock API settings to limit response length further or adjust other parameters. This would, however, require another round of refining the prompts, which would create other costs of its own.

Conclusions

Diving into CVE data with AWS Bedrock and Claude taught us several big lessons:

Keep an eye on costs
Make sure your prompts are spot on
Don’t let the answers get too long, or things can get pricey

Even though we ran into some bumps, Claude v2.1 proved to be really useful. The trick is to tweak the prompts to get more short answers that don’t cost as much.

Through this project, we gained invaluable hands-on experience with AWS Bedrock, significantly enhancing our understanding of prompt construction and operational capabilities and a clear insight into its limitations.

This endeavor enabled us to automate a process that, if done manually, would have been nearly impossible. Analyzing a single vulnerability to discern AT, including potential further research, would take an analyst approximately 2-5 minutes. Given the sheer volume of data involved, this translates to roughly 200 days of continuous work for a single person, highlighting a massive time-saving and addressing the reliability issues stemming from the specialized knowledge and cross-referencing required in manual analysis.

Recommendations

Below are key takeaways from our experiment worth considering when working with any LLMs.

Thorough cost estimation. Estimate costs carefully using a representative sample of all expected data. Conduct a detailed pre-analysis to account for the variability in data complexity and response detail levels.
Consideration of time. Factor in the time required for processing requests, especially when dealing with large datasets. Implement strategies to optimize request handling, such as batching or spreading out processing to avoid quota limitations.
Tune model responses. Fine-tune prompts to elicit responses that are as detailed as necessary for your specific use case. This might involve iterative testing and refinement of prompts to balance detail richness and processing efficiency. While Claude can justify its verdicts amazingly, this is not without a cost.
Use caution with parallel processing. While parallel processing can speed up data handling, balancing this with quota limitations is essential.
Error handling and retry logic. Incorporate robust error handling and retry logic in your integration code to manage intermittent issues gracefully.
Security and data privacy. Prioritize security and data privacy in all interactions with AI services. Ensure that data handling complies with relevant regulations and best practices, particularly when processing sensitive or proprietary information.
Collaboration with AI providers. Foster a collaborative relationship with AI service providers for support and guidance. Leverage their expertise to optimize your use of their technologies, from model selection to integration best practices.
Documentation and knowledge sharing. Document your processes, challenges encountered, and solutions implemented. Share knowledge within your team and the broader community to foster best practices and collaborative problem-solving.

Future outlook in AI and cybersecurity

At Mend.io, we’re already seeing how AI can change cybersecurity. We started using AI to make sense of messy data a while ago, but the rise of LLMs made it more accessible. This success has shown us that LLMs can do wonders in sorting through data to find the significant bits, which is super helpful for keeping things secure. We’re looking to take things up a notch by further integrating those AI capabilities within our day-to-day operations. This means our security experts and data analysts will get their jobs done faster and better, thanks to AI helping them out. We are also working on integrating LLMs to better handle legal issues like license compliance and copyrights. This way, we can solve these tricky issues more accurately and quickly. This is merely a glimpse into our broader initiatives, with only a select few ready for public unveiling.

The future of AI in cybersecurity is promising, especially as we aim to harness the industry’s vast but unorganized knowledge. With the advancement of AI technologies, we’re now equipped to sift through and extract valuable insights from this data, turning what was once a challenge into a significant asset. AI’s ability to analyze and organize complex information sets the stage for more proactive and intelligent cybersecurity measures. This evolution suggests a shift towards responding to threats and predicting and preventing them, leveraging the deep knowledge we’ve accumulated over the years.

What Existing Security Threats Do AI and LLMs Amplify? What Can We Do About Them?

Maciej Mensfeld — Thu, 18 Jan 2024 21:49:41 +0000

In my previous blog post, we saw how the growth of generative AI and Large Language Models has created a new set of challenges and threats to cybersecurity. However, it’s not just new issues that we need to be concerned about. The scope and capabilities of this technology and the volume of the components that it handles can exacerbate existing cybersecurity challenges. That’s because LLMs are deployed globally, and their impact is widespread. They can rapidly produce a huge volume of malicious content that can influence millions within hours, and have major detrimental effects. As they rely on vast datasets and computational resources, the threats they face can be multifaceted and challenging to address.

Let’s take a look at some pre-existing security issues that generative AI and LLMs could amplify and then consider what tactics and tools might be used to protect users against these threats.

Amplified existing cybersecurity threats

Software vulnerabilities. Because LLMs are just engines that run in a software ecosystem containing vulnerabilities and bugs, they may be vulnerable to regular attacks. Furthermore, malicious code and exploits can be generated using LLMs.
Dependency risks. The same dynamic applies to dependency risks in generative AI and LLMs. When they rely on third-party software, libraries, or components, vulnerabilities in these dependencies can indirectly compromise the LLM.
Phishing and social engineering. As with all online platforms, there’s the risk of phishing attacks aimed at gaining unauthorized access. This can occur in two ways. Firstly, you can use LLMs to craft really good phishing data. You can fine-tune data on a given person or entity based on information about their interests or behavior to craft highly targeted phishing attacks, or you can manipulate prompts that skew outcomes for social engineering purposes. In this case, LLMs aren’t the target but the tool for deception.
Physical security threats. Servers and infrastructure housing the LLM can be vulnerable to physical breaches or sabotage.
Legal threats. The use and abuse of copyrights could be a significant challenge when using AI and LLMs. Courts may rule that the outcome of using an AI model can’t be considered something that you can copyright, because it’s machine generated. There is no human “owner.” This could be problematic with code and creative work. Major organizations like AWS and Microsoft are investing in ways to overcome this issue by owning the whole supply chain, so they will be less dependent on third-party vendors and will have more control over the means of production and over the content itself.

Licenses are a particular legal issue when considering the outcomes of using LLMs. For example, if you don’t realize that an original source in your LLM isn’t permissive, then you could face legal action for using it. There’s a gray area where the LLM outcome may resemble a piece of code that is licensed under a copyleft license with certain requirements, such as the Apache 2 license with a commons clause. If the outcome is then adopted and used by somebody else, then you could both be sued, in theory, for not applying the proper license criteria. You could be forced to stop using this piece of code and replace it with something else or pay millions.

On the other hand, AI and LLMs can make it more difficult to claim ownership and assert licensing rights, because an element of machine generation has been injected into the mix. If your LLM generates 20 lines of generic code that sits within hundreds more lines of code, who owns it if someone else fine-tunes it? There will be some open projects where you give an LLM a description of what you want to build, and it’ll create multiple functions from numerous bits of code. Who owns what is generated? This problem is why some companies don’t allow developers to use public LLMs, or impose restrictions on their use.

How to secure generative AI and LLMs

What can you do to maintain security when using generative AI and LLMs? Strategies and tactics include:

Fine-tuning. This involves calibrating the LLM on custom datasets to restrict or guide its behavior. In theory, this will set your model in the right direction and steer it away from generating less accurate and more unexpected information and data. By taking care to do this, you guide your LLM towards generating more expected results, which you can be more confident are reliable. Does it always work? Probably not. Does it generally work? Yes, because you are providing guard rails for the LLM from which they shouldn’t deviate.
Input filtering. Similarly, this is about instructing or guiding your LLM to better meet your needs and avoid any unexpected behaviors. Input filtering uses algorithms to filter out harmful or inappropriate prompts. It’s a methodology that a few companies are working on, alongside output filtering, as a way to stop generating code that could be damaging to you and your customers. Use logging and monitoring tools like Splunk or ELK Stack to analyze logs for signs of misuse.
Rate limiting. We’ve previously noted that the volume, speed, and complexity of AI and LLMs present a threat because the vast number of inputs and data means it’s easy to overlook some issues. To prevent abuse, you can limit the number of requests a user can make within a specific time frame. Apply tools such as web application firewalls (WAF) to protect LLM API endpoints from common web-based attacks.
Continuous monitoring and auditing. At Mend.io, we are big advocates for making security a constant and ongoing process. Applying this as best practice and instilling it as a mindset within your organization will certainly harden your cybersecurity. When it comes to these new tools and technologies, constantly evaluating the outputs and behaviors of the LLM for any signs of misuse or unexpected behavior, means you’re alert to them and can address them quickly, before they cause damage or before their impact can escalate.
Intrusion detection systems (IDS). These enable you to monitor and detect malicious activities.
User authentication. Ensure that only authenticated users can access the LLM by deploying authentication systems like OAuth or JWT.

We can anticipate that new methods and tools will emerge to secure generative AI and LLMs, such as advanced behavior analysis, which will use AI to monitor and understand the behavior of users interacting with the LLM, and decentralized LLMs, which involves deploying LLMs in decentralized networks to reduce single points of failure in a similar vein. We can also anticipate the development and introduction of decentralized security protocols: distributed systems that can secure LLMs without relying on central authorities.

AI will also be deployed in securing itself with self-adapting security systems – security tools driven by AI that can adapt in real time to emerging threats. Blockchain could be used for auditing by providing immutable records of all interactions with the LLM for traceability. And there’ll be a role for semantic analyzers, to analyze the content generated by LLMs so that it meets ethical and safety guidelines.

Whatever the direction generative AI and LLMs takes, one thing you can be sure of is that as the technology evolves and becomes even more sophisticated, security methodology must also develop further.

What New Security Threats Arise from The Boom in AI and LLMs?

Maciej Mensfeld — Wed, 15 Nov 2023 08:43:00 +0000

Generative AI and large language models (LLMs) seem to have burst onto the scene like a supernova. LLMs are machine learning models that are trained using enormous amounts of data to understand and generate human language. LLMs like ChatGPT and Bard have made a far wider audience aware of generative AI technology.

Understandably, organizations that want to sharpen their competitive edge are keen to get on the bandwagon and harness the power of AI and LLMs. That’s why, in a recent study, Research and Markets predicts that the global generative AI market will grow to a value of USD 109.37 billion by the year 2030.

However, the rapid growth of this new trend comes with an old caveat: with progress comes challenges. That’s particularly true when considering the security implications of generative AI and LLMs.

New threats and challenges arising from generative AI and LLMs

As is often the case, innovation often outstrips security, which must catch up to assure users that the tech is viable and reliable. In particular, security teams should be aware of the following considerations:

Data privacy and leakage. Since LLMs are trained on vast amounts of data, they can sometimes inadvertently generate outputs that may contain sensitive or private information that was part of their training data. Always be mindful that LLMs are probabilistic engines that don’t understand the meaning or the context of the information that they use to generate data. Unless they are instructed or guardrails are used, they have no idea whether data is sensitive, whether it should be exposed or not, unless you intervene and alter prompts to reflect expectations of what information should be made available. If you train LLMs on badly anonymized data, for example, you may end up getting information that’s inappropriate or risky. Fine-tuning is needed to address this, and you would need to track all data and the training paths used, to justify and check the outcome. That’s a huge task.
Misinformation and propaganda. Bad actors can use LLMs to generate fake news, manipulate public opinion, or create believable misinformation. If you’re not already knowledgeable about a given subject, the answers that you get from LLMs may seem plausible, but it’s often difficult to establish how authoritative the information provided really is, and whether its sources are legitimate or correct. The potential for spreading damaging information is significant.
Exploitability. Skilled users can potentially “trick” the model into producing harmful, inappropriate, or undesirable content. In line with the above, LLMs can be tuned to produce a distribution of comments and sentiments that look plausible but skew content in a way that presents opinion as fact. Unsuspecting users consider this content reasonable when it may really be exploited for underhand purposes.
Dependency on external resources. Some LLMs rely on external data sources that can be targets for attacks or manipulation. Prompts and sources can be both manual and machine-generated. Manual prompts can be influenced by human error or malign intentions. Machine-generated prompts can result from inaccurate or malicious information and then be distributed through newly created content and data. Can you be sure that either is reliable? Both must be tested and verified.
Resource exhaustion attacks. Due to the resource-intensive nature of LLMs, they can be targets for DDoS attacks that aim to drain computational resources by overloading systems. For instance, you could set up a farm of bots to rapidly generate queries at a volume that could pose operational and efficiency problems.
Proprietary knowledge leakage. Skilled users can potentially “trick” models into exposing their valuable operations prompts. Usually, when you build functionality around AI, you have some initial prompts that you test and validate. For example, you can prompt LLMs to recognize copyrights, identify the primary owner of source code, and then extract knowledge about the copyrights. Potentially this means a copyright owner could lose their advantage over competitors. As I wrote earlier, LLMs don’t understand the information they generate, so it’s possible that they inadvertently expose proprietary knowledge like this.

These are not the only security concerns that arise from generative AI and LLMs. There are other, pre-existing issues that are amplified by the advent of this technology. In my next blog post, we’ll examine these issues and we’ll take a glance at how we might address them to safeguard users’ cybersecurity.

Cybercriminals targeted users of packages with a total of 1.5 billion weekly downloads on npm

Maciej Mensfeld — Sun, 02 Oct 2022 18:36:41 +0000

Another week, another supply chain incident. It’s been only nine days since the Mend research team detected the dYdX incident, and today we have detected another supply chain malicious campaign.

On October 02, 2022 at 12:12 UTC, a new npm account was registered, and a package called nuiversalify was immediately uploaded. The same threat actor then proceeded to publish more typo/spellcheck squattings of popular packages until 14:03:29 UTC, with small but irregular time gaps between uploads. The irregular publishing cadence may suggest that for many name-cases, the npm typosquatting mechanism worked as expected.

In a typosquatting attack, an attacker publishes a malicious package with a similar name to a popular package, in the hope that a developer will misspell a package name and unintentionally fetch the malicious version.

In total, the threat actor published 155 packages to npm targeting users of the following packages:

Legit package name	Weekly downloads
universality	51,810,168
webidl-conversions	61,052,599
shebang-command	52,050,912
anymatch	47,421,866
string-width	98,034,656
tslib	137,845,760
micromatch	63,087,644
supports-color	224,997,209
http-errors	52,840,509
ansi-regex	130,553,307
glob-parent	68,724,436
ignore	49,645,827
postcss-value-parser	50,840,222
has-flag	161,338,767
debug	205,533,537
estraverse	69,169,061
jsesc	46,116,034
i18n	240,385
Total	1,571,302,899

Here are the names of the packages that were uploaded:

tsilb, nuiversalify3, micrmoatch, lgob-parent, glob-praent, http-rerors, postcss-valeu-parser, jsecs, y81n, ussports-color, stirng-width, string-wdith, webidl-conversinos, asni-regex, 1y8n, sypport-color, ahs-flag, igonre, string-iwdth, webidl-conversiosn, esrtaverse, hsa-flag, shebnag-command, webidl-covnersions, univesralify, webidl-conevrsions, strign-width, y1n8, suopport-colors, shebang-comamnd, microamtch, anymathc, uinversalify, naymatch, anis-regex, postcss-value-pasrer, ansi-reegx, webidl-convesrions, aynmatch, string-widht, wbeidl-conversions, glob-parnet, sheabng-command, ansi-rgeex, estraveres, stlib, shebang-commadn, soupports-colors, webidl-conversion, webdl-conversions, estravrese, http-erorrs, tsring-width, ignoer, has-falg, supports-colro, shebang-cmomand, deubg, shebagn-command, anmyatch, has-lfag, strnig-width, glob-aprent, opstcss-value-parser, shebang-ocmmand, supprots-color, hsebang-command, srting-width, aypports-color, estravesre, dupport-colors, nuiversalify, ansi-regxe, tlsib, spuports-color, glob-paernt, ginore, webdli-conversions, postcss-value-parsre, sehbang-command, has-flga, http-errosr, glbo-parent, golb-parent, postcss-value-paresr, string-witdh, ewbidl-conversions, universlaify, estraevrse, tslbi, suypport-colors, micormatch, thtp-errors, univeraslify, supoprts-color, ebidl-conversions, supports-cloor, anyamtch, syupport-colors, ignroe, webidl-conversoins, htpt-errors, postcss-vlaue-parser, supporst-color, postcss-value-praser, nuiversalify1, edbug, universailfy, potscss-value-parser, posctss-value-parser, postcss-avlue-parser, webid-conversions, univresalify, anymacth, ansi-ergex, uspport-colors, glob-paretn, webidl-ocnversions, weibdl-conversions, nasi-regex, uspports-color, micromacth, micromtach, universalfiy, anymtach, universaliyf, shebang-commnad, postscs-value-parser, postcss-vaule-parser, wbedil-conversions, imcromatch, http-errros, dypports-color, etsraverse, webidl-cnoversions, nuiversalify2, suppotrs-color, psotcss-value-parser, micromathc, postcss-value-aprser, jessc, mciromatch, supports-oclor, setraverse, jssec, sjesc, estarverse, ingore, estrvaerse, unievrsalify, mircomatch, postcss-val-parser, supports-coolr, webidl-convresions, webidl-converisons

Nature of the attack

In contrast to the dYdX campaign, this attack did not target a specific organization. Rather, the threat actor opted to cast the widest net possible: with more than 1.5 billion (yes billion) downloads, the probability that someone would accidentally run npm install and make a typo is pretty high. Even one mistake out of millions of downloads could result in the infection of tens, if not hundreds, of machines.

All of the uploaded packages had the same content. They contained:

Readme is taken from the legit universalify package
package.json with a preinstall hook
MIT license of universalify
The malicious exploder index.js file

The malicious code preinstall hook can be seen in line 25.

Upon installation of this package, the malicious index.js would be executed. Its content was fairly simple. Aside from the boilerplate code, it tried to download a harmless-looking “README.txt.lnk” attachment from the discord CDN:

Why was discordapp selected as the source of the file? While we cannot know for sure, it may be that the attacker wanted to use a “legitimate” source.

So, what does this enigmatic README.txt.lnk contain?

If you upload it to virustotal, there are several signatures from AV systems that trigger:

Inside of this file was another executable code:

That would download and run a VBS script:

And yet again, this VBS script when executed would perform few actions but the primary one was to download and install a backdoor password stealer/trojan:

Actions taken by Mend.io

The Mend Supply Chain Defender notified us about this malicious actor 17 minutes after the first package was published.

Once we confirmed that the attack was not a false positive, we reached out to npm and other parties involved and asked for the packages and the malicious content to be removed. All of the malicious packages were removed around 3pm UTC.

How to protect against similar attacks?

Sometimes, doing a manual review of installed packages is not enough – the preinstall hook used by the attacker is deceiving. Automated supply chain security solutions such as Mend Supply Chain Defender inform you when you import a malicious package that contains malicious code.

Popular Cryptocurrency Exchange dYdX Has Had Its NPM Account Hacked

Maciej Mensfeld — Fri, 23 Sep 2022 18:43:23 +0000

San Francisco-based dYdX, a widely used decentralized crypto exchange with roughly $1 billion in daily trades, has had its NPM account hacked in a software supply chain attack that was likely aimed at gaining access to the company’s production systems. The company, founded by ex-Coinbase and Uber engineer Antonio Juliano, dYdX has raised a total of $87 million in funding over 4 rounds and is backed by some powerhouse investors, including Paradigm, a16z, and Polychain.

Here is what we know:

On 23 September 2022, several new versions of packages owned by dydX were published to NPM. NPM is the world’s largest software repository, with more than 800,000 code packages. Beginning at 12:37 CET, the attacker published new releases to the following packages :

@dydxprotocol/perpetual (https://www.npmjs.com/package/@dydxprotocol/perpetual)
@dydxprotocol/solo (https://www.npmjs.com/package/@dydxprotocol/solo)
@dydxprotocol/node-service-base-dev (https://www.npmjs.com/package/@dydxprotocol/node-service-base-dev)

Mend’s Supply Chain Defender automatically detected each malicious package within 30 minutes of the initial releases. Once the packages were flagged, the Mend research team first confirmed that the issue was indeed a supply chain attack. We also tried reaching out to the dydX platform before opening the public report. Due to the severity of the attack and the popularity of those packages, we have decided to open the issue in the appropriate GitHub repository (https://github.com/dydxprotocol/solo/issues/521).

Figure 1 – Versions history of @dydxprotocol/perpetual

Note: The release of @dydxprotocol/node-service-base-dev was taken down right after it was published. Therefore, it does not have an advisory.

Given the nature of dYdX’s business, we decided to act quickly to reduce potential widespread financial impact. Overall, those three compromised packages have more than 120,000 downloads:

How was the malicious actor able to ship the code to npm?

Although we cannot fully confirm, it seems they were able to use a stolen npm account acquired in a different attack, or by performing an account takeover.

Would any malicious release be spotted if you checked the code from the main branch on Github? Unfortunately, no. To avoid suspicion, it seems that the attacker did not obtain or did not use the Github access. Instead, they tried to publish to npm in the most unobtrusive way possible, by updating only the minor versions for each package. Since minors usually do not contain breaking changes, not many are interested in reviewing them.

Mens rea

While it’s impossible to say for sure, we can presume that the attack was related to what dydX does: cryptocurrencies. Based on the vector attack, we can conclude that the actor was interested in obtaining access to their production systems or other systems that would use it. Did they succeed? We do not know. At the moment of publication, we have not received any comment from dYdX.

Additionally, we have contacted npm, Github, and Tucows Domains, to lessen the scope of the attack.

Continuous divert

All of the malicious package versions contain a preinstall hook that looks as if it was about to download a CircleCI script. This is brandjacking in its purest from – the domain looks as if it belongs to CircleCI.

Figure 3 – Defender Diff of the malicious version
(link: https://my.diffend.io/npm/@dydxprotocol/solo/0.41.0/0.41.1)

Our scanners alarmed about the malicious code the script contains:

The first JS script downloads a setup.py file, and then executes its content:

Upon successful execution, the script uploads:

hostname,
username,
client’s working directory,
IP address,
SSH keys,
AWS credentials,
IAM roles,
ENV variables,

to the attacker’s server. All sensitive data is saved in txt files before the upload.

After the upload, the attacker blurs the traces and removes intermediary files.

How to protect against similar attacks?

Learn more about Supply Chain Defender

How to Conquer Remote Code Execution (RCE) in npm

Maciej Mensfeld — Tue, 19 Jul 2022 11:36:52 +0000

Recently, there have been some remote code execution (RCE) attacks that included just a single line of well-built code that can run a remote shell. Let’s take a look at why and how these attacks work, why npm is particularly susceptible, what could happen if they get into machines, and how to detect and fix them.

Why is npm susceptible to RCE?

npm is particularly susceptible to RCE attacks because it’s the world’s largest software registry, and because the registry design contains numerous flaws that make it easy for individuals to publish malicious code unnoticed. As npm doesn’t automatically check for vulnerabilities or prevent developers and users from uploading and downloading insecure packages, it is a prime target for attackers. To give one example, a malicious actor can just create a connection to a remote host to establish a reverse shell.

Moreover, since the npm registry never runs the code autonomously, both benign and malign code is only activated when being used by end-users. Think of npm as similar to Dropbox. Dropbox won’t open, run, and read your files and work with them unless you are working on them. The same applies to npm. npm and other registries give you the space to show your projects and share open source code with other users, but not the ability to run them. This means that the malicious package essentially lies dormant until somebody unknowingly installs the package, thus exposing their code to attack when they use affected packages.

Why is RCE such a threat?

When you install an RCE package, you give away all the permissions of the current user. Most users don’t use isolated environments, so that means with each installation, you’re giving abstract permission for universal access to your machine. Permissions are granted at the time of installation. When you install dependencies, you grant permissions, making it quite convenient for attackers to access every line of code. When a connection is established, they have a full remote shell, and they can do what they want with it. Just imagine giving away your login and password to anyone on the internet and allowing them to access your computer from their machine, to do whatever they want. That’s the level of risk you run.

Many users are either unaware of this point of weakness, or in the interests of speed, choose to ignore it. Users either don’t know or don’t care about these risks, because they expect a level of security from npm and other registries that isn’t automatically there. Unlike app stores like Google Play or equivalents, where all packages are scanned and certified, open source spaces don’t have the resources to do that. Unfortunately, users often erroneously assume that they do.

How do RCE cases commonly behave and operate?

Broadly speaking, RCE attacks fall into one of three different categories.

Firstly, RCE attacks are mostly used for data exploration, opening the door for malicious actors to steal data such as credentials, research information such as intellectual property, hostname center details, and Secure Shell (SSH) keys that authenticate and establish encrypted communication channels over the internet between clients and remote machines. Ethical hackers use it to prove that compromises have happened.
The second common scenario is cryptocurrency mining. For example, at the time of writing, a very busy attacker uploaded roughly 1,200 malicious packages containing cryptocurrency mining tools to npm over the course of 48 hours. Malicious hackers use others’ existing resources to mine and steal cryptocurrency. Alternatively, they use RCE to upload cryptocurrency mining tools into packages on npm. Usually, these are easy to detect and notice, because they’re big pieces of software and you can get binaries to scan and analyze them.
In the third case, RCE works within cryptocurrency exchanges to steal savings from accounts by copying and pasting wallet IDs and replacing them. A script runs in the background that checks where there’s a crypto wallet on the clipboard and replaces it.

What should be done to identify and remediate RCE exploits/flaws?

There isn’t one solution to identify and remediate all exploits, but there are a few things you can do, from both the registry and end-user perspective.

On the registry side. implementing user grading is one tactic to reduce incidents. User grading limits how many packages users can upload, and you can benchmark and score against predetermined limits. This provides a degree of scrutiny, but it obviously won’t fix cases where regular users with better gradings and higher limits are free to upload more packages, some of which may pose risks. It will also not prevent account takeover (ATO) attacks.

Scanning code is another method. Although npm has been talking about automatic scanning systems, they haven’t yet announced any solution of that nature, so it’s up to the user to implement such a system.

On the end-user side, you could stop updating dependencies. Package versions are supposed to be immutable, so that you know how they will always behave. When there’s an update and a new version is needed, you’re supposed to replace the old version with the new. Nevertheless, if you lock all your dependencies, you put your software in a semi-maintenance mode. There are also bug fixes and CVEs that you may want to patch with updates.

You can also try blocking all the install hooks with npm, but this may cause some code to stop working because you may block legitimate cases. You may compile extensions or download some third-party binaries. This is bad because all the package content should already be in the package.

Or you can try implementing your own security scanning. Some users have attempted to do this by downloading packages and scanning them all with antivirus software. The problem with this is that you may miss many cases. That’s because antivirus software is good with binaries and mainly with binaries that it knows. So, you have signatures for certain behaviors of packages that are already running, and only then does the antivirus software kick in. They don’t scan scripts or text-based software.

Unfortunately, the majority of attacks do not presently use binary files. They use text-based code that can be obfuscated because the antivirus or the operating system doesn’t detect it. The code activates any malicious intent once it has been imported and installed. In fact, it’s easy for malicious code to escape the detection of traditional antivirus software, and sneak into your code base. That’s the reason why software supply chain attacks have become so widespread.

Preventing RCE in npm and RubyGems with Mend

Mend Supply Chain Defender is specifically designed to prevent malicious open source software from entering your code base. When it comes to RCE attacks in npm, Mend Supply Chain Defender can rapidly detect newly released packages and within around 15 minutes it picks them up, scans them, and grades them for risk. If a package exceeds a graded risk threshold, the tool blocks it so that it can be manually inspected, and its risk mitigated.

Mend Supply Chain Defender has a very thorough scoring system that prevents you from using packages that haven’t yet been analyzed. Only when either an automatic system or a manual review gives you a positive result for usage, can the package version be downloaded. Even then, it won’t immediately be installed, and downloading can be blocked if there is any doubt. Whether you’re using our plugins or using the artifactory integration that we recently released, Mend Supply Chain Defender stops suspect packages from being downloaded.

Furthermore, when you use our plug-ins, you can create your own custom rules and thresholds to manage which packages and downloads you will permit. You can create policies that can be more or less restrictive, tailored precisely to the needs of your organization and your developers. Learn more about how you can mitigate your open source supply chain risks with Mend Supply Chain Defender, here.

Impact Analysis: RubyGems Critical CVE-2022-29176 Unauthorized Package Takeover

Maciej Mensfeld — Tue, 10 May 2022 09:03:00 +0000

On May 6, 2022, a critical CVE was published for RubyGems, the primary packages source for the Ruby ecosystem.

This vulnerability created a window of opportunity for malicious actors to take over gems that met the following criteria:

Contained a hyphen (-) in the name. (For example, rspec-core or concurrent-ruby)
No existing gem is named with the word before the hyphen. (For example, kostya for kostya-sigar or googleapis for googleapis-common-protos-types)
The gem being yanked had to be either created within the past 30 days or had not been updated in more than 100 days

Because RubyGems provides data dumps that include a lot of information, it is unfortunately relatively simple to create an automated mining process for these criteria.

Moreover, we cannot assume that this vulnerability has gone previously unnoticed by malicious actors. While it was reported by a security researcher, the investigators proceeded with the assumption of existing compromises using this vulnerability. With that in mind, we revisited all available packages on RubyGems when looking for suspicious activities.

Nature of CVE-2022-29176

A bug in RubyGems that allowed unauthorized actors to yank (remove) a package version without being its owner. The request would be dispatched for the controlling package actor but due to how the package version fetch was performed, the yank would be performed on a different package. The relevant code looked as follows:

find_by!(full_name: "#{rubygem.name}-#{slug}")

Because the slug was not filtered/checked correctly, it allowed strings into the query that could reach versions of other packages.

For example, with googleapis case, the following package version could be built: slug: googleapis-common-protos-types-1.3.1 when sending common-protos-types-1.3.1 as a slug.

This effectively would remove this version of the targeted package.

Things escalate quickly with issues of such a nature. By removing all the versions, under certain circumstances the name could be available for a reuse. This means that we’ve got ourselves a great new package name available for use.

What about package versions immutability?

RubyGems package versions are immutable and irreplaceable by design. This means that unless there is a security incident where someone compromises RubyGems and tampers with package content, users cannot simply replace or update a given version. Versions can be removed and a new one can be uploaded, but the new one needs to have a different number.

Or does it?

What is often missed here is that a single RubyGems version is unique only within the scope of the platform on which it was released. Platforms are based on the CPU architecture, operating system type, and sometimes the operating system version. Examples include “x86-mingw32” or “java”. The platform indicates that the gem only works with a ruby built for that platform. RubyGems will automatically download the correct version for your platform.

The default platform is known as -ruby, which should work on any platform. And while you cannot replace its content, nothing prevents you from removing it and releasing a new platform-specific version with the same number. For example, you could remove karafka-testing-1.4.3 and upload karafka-testing-1.4.3-i686-linux. The version itself would be unchanged, but it would specify the platform.

Nothing prevents a malicious actor from listing all the platforms available and releasing a version per platform to make sure everyone is affected.

I have a Gemfile.lock and it is immutable, right?

Actually, no. There are cases where Bundler can re-resolve dependencies despite having a lockfile. It is an expected behavior, but nonetheless it may pose both legal and security risks. More than half a year ago I wrote an article on using the –frozen flag and why it should be a standard for production, testing, staging, and actually any other non-dev environment.

With a –frozen as the default and a compromised package, you would get a message similar to this one:

“Your bundle only supports platforms [“x86_64-linux”] but your local platform is x86_64-darwin. Add the current platform to the lockfile with `bundle lock –add-platform x64-mingw32` and try again.”

While it is not descriptive in this context or state any security risk, at least it might draw attention to the fact that something has changed.

Wouldn’t having checksum verification in Bundler help?

No. Not in a case as stated above Bundler might make a decision to re-resolve dependencies. New dependency means an update lockfile. Updated lockfile means a new checksum for the same version but targeting a specific platform.

Impact assessment and incident analysis

As part of our ongoing initiative to help open source software communities and package registries protect all users, Mend provides intelligence derived from our Supply Chain Defender platform.

The moment we were notified about this incident, we ran an assessment using Supply Chain Defender to make sure that:

No popular packages were tampered with
No packages were taken over via this vulnerability (aside from research packages)
No packages had platform-specific versions released with the -ruby removed alongside

When analyzing such a case, we start with the impact assessment. Because we collect information about RubyGems in real time, we were able to check, that for the last year there were:

132,045 versions added or removed
16,629 packages were affected by those changes

Because Supply Chain Defender tracks ownership transitions, we are aware of any ownership changes that occur for packages.

Because this attack requires a new owner, we could then reduce the scope to 1,101 packages.

When a regular ownership transfer happens, there should be a phase in which there are two owners:

The old owner giving the package away
The new owner accepting the ownership

In the case of a package takeover of that nature, there should be no transition phase. One owner disappears, and a new one appears.

Expected ownership transition flow:

Unexpected ownership transition flow:

Note: This pattern can have legitimate cases, but for us it acts as a part of the funnel.

When applying this logic to our 1,101 packages, we end up with 174 packages. Now after filtering for packages with a hyphen, we end up with 60 left. Out of those 60, only three had per platform specific releases:

shopify-proxy-2-jit
mrslave-omniauth-runner
mygem-dcgl-other

Note: There were more proof of concept packages of this issue, but those did not have ownership changes and were thus irrelevant. Those packages were all of a research nature.

Just to be sure, we double-checked all 60 and did not find any malicious signatures.

Can we really assume no one noticed this earlier?

No. That is why we also checked all the packages that would have the ruby platform one yanked while having other platforms for the same version present:

SELECT versions.package_id from versions
  inner join (
    SELECT "versions"."package_id", "versions"."number" FROM "versions" WHERE  yanked_at is not null
  ) yanked
  on versions.package_id = yanked.package_id and versions.number = yanked.number
  where versions.yanked_at is null

This gave us 85 packages, out of which 29 had a hyphen. For those 29 we have performed a manual review and again, did not find any signs of malicious compromise.

What about “yank now, take over later”?

This was also checked. We have identified 25 packages that were removed in the last two weeks, but none of them was popular enough to raise our awareness.

What about “regular” tampering cases?

While this is out of scope of this investigation, we also monitor various registries for potential signs of package tampering. So far, we have not found any issues in RubyGems indicating any problems. On top of that, we have also not found any malicious packages that would anyhow correlate to this incident.

Summary

While this issue was indeed critical, as it may have caused havoc in the Ruby community, based on our data and the following investigation, to the extent of our knowledge, we have concluded that no gems were compromised and the issue was mitigated.

Mend’s automated malware detection platform, Supply Chain Defender, checks to make sure you’re only using verified package sources and prevents you from importing any malicious package into your organization or personal machine. Mend Supply Chain Defender is free to use. Sign up here >>

Five Critically Important Facts About npm Package Security

Maciej Mensfeld — Tue, 08 Feb 2022 09:30:00 +0000

In 2021, the Mend Supply Chain Defender automated malware detection platform detected and reported more than 1,200 malicious npm packages that were responsible for stealing credentials and crypto, as well as for running botnets and collecting host information from machines on which they were installed.

As with all malicious npm packages that Mend identifies, those packages were immediately reported to the npm security team and subsequently removed from the registry as part of our ongoing effort to make the open-source software (OSS) community safer.

As a result of doing this over the past year, our researchers have developed a list of five critically important facts that are vital to understanding npm package security:

There is a “trust by default” approach towards open source software that attackers exploit: Open source provides a great avenue into a company’s software supply chain because developers simply don’t have time to read every line of code from every package that’s in use. While projects usually start out with the latest versions, they typically fall behind, and when developers do upgrade to the latest version, they don’t have time to review all the changes, some of which might be malicious.

Default behavior isn’t secure: By default, npm packages are supposed to include everything needed for their functionality. Unfortunately, many packages download additional resources upon installation. Such behavior may mean the developer doesn’t have a chance to review and analyze the content of all packages. And this creates an opportunity for such packages to be compromised because consistency checks aren’t being implemented.

Attackers research the best way to use npm for attacks: In many cases, attackers have only a specific time in which to work – between the moment they upload the malicious code and the moment it‘s reported by security researchers. To actually gauge how long this window of time is, a malicious actor may upload an intentionally broken package to measure the length of time before the package is removed. Armed with this knowledge, they can craft attacks to take advantage of this time window. Creating a successful attack can be extensive work for an attacker. Not only does the attacker need to ensure that his code will work, but he also needs to ensure that it affects as many systems as possible and reaches the machine he is ultimately interested in accessing.

Malicious npms don’t need to be run or be used: If a malicious npm is downloaded, it automatically is given permission to do whatever it wants.

Dependency hell is used to hide malicious activity: On average, npm packages depend on four other packages to do their dirty work. This is dependencies upon dependencies, which is often referred to as dependency hell. The more dependencies in a package, the higher the probability is that one of them will become rogue. When there are many packages with dependencies, it creates noise that is extremely difficult to filter. When that happens, an attacker can add an unexpected package dependency chain, thus compromising a dependency for a popular library – and these activities will go completely unnoticed.

To learn more about malicious npm packages identified by Mend.io, read the Open Source Risk Report >>

A Malicious Package Found Stealing AWS AIM data on npm has Similarities To Capital One Hack

Maciej Mensfeld — Wed, 02 Feb 2022 13:00:00 +0000

Imagine a seemingly harmless npm package that secretly steals your AWS credentials, just like the attackers did in the Capital One hack of 2019.

This isn’t a hypothetical scenario – researchers recently discovered a malicious package with this very capability.

In this article, you’ll learn how the ‘@maui-mf/app-auth’ package functioned as a digital Trojan horse. We’ll explore how it exploited a similar vulnerability to the Capital One attack, and the critical data it targeted. We’ll also discuss the importance of security measures like Mend Supply Chain Defender in protecting your organization from such threats.

This article is part of a series of articles about malicious packages.

The malicious functionality of the @maui-mf/app-auth package

In the latter part of December 2021, the Mend.io research team detected the new release of a package called @maui-mf/app-auth. This package used a vector of attack that was similar to the server side request forgery (SSRF) attack against Capital One in 2019, in which a server was tricked into executing commands on behalf of a remote user, thereby enabling the user to treat the server as a proxy for requests and gain access to non-public endpoints.

In the case of the @maui-mf/app-auth package, in addition to the thousands of lines of regular JavaScript code related to using React, there were a few special lines of code that ran upon installation of the package. That additional code sent host details while also performing an HTTP request to a certain endpoint. This endpoint data – both host and endpoint path – were obfuscated with base64 to make them harder to detect.

Once decoded, it becomes evident that the request targets an Amazon Web Services (AWS) Metadata Service. The URL was used to provide an HTTP application programming interface (API) for retrieving information like the node’s IP address, placement within the AWS network, hostname and IAM security credentials.

In the case of the @maui-mf/app-auth package, the targeted endpoint returned information about the IAM role assigned to the instance from which the request was made. Left unchecked, the data would be passed on to the external domain “microsoft-analytics.net” via the domain name system (DNS) lookup query.

In case you have any doubt, this was malicious activity. While the data provided by this package wasn’t enough to perform a full SSRF attack, it could be enough to give the attacker knowledge of potential vectors for further exploitation.

In the parlance of the MITRE ATT@CK framework, this was both an “initial access” and a “discovery” tactic. For instance, by installing the package, an attacker could validate that there are indeed vulnerable machines out there and in turn release a new version of the same package that will contain exploits of those vulnerabilities. That new version could try to elevate the permissions by running a second query to the AWS endpoint to obtain the credentials needed for the exploitation.

Beyond stealing credentials: A landscape of malicious packages

While the ‘@maui-mf/app-auth’ package focused on credential theft, it represents just one type of threat lurking within seemingly legitimate packages. Here are a few more examples that showcase the diverse tactics malicious actors employ:

License Tampering: Researchers have discovered a new type of malicious code that masquerades as software protection. This code can actually remove essential directories from a project if it detects the software is not licensed during deployment. This tactic disrupts development workflows and highlights the importance of secure coding practices.
Hidden Backdoors: Another critical vulnerability involved a compromised version of a popular utility, XZ Utils. This malicious package contained a backdoor that attackers could exploit to gain unauthorized access via SSH. This incident underscores the importance of using trusted repositories and maintaining strong access controls.

By staying informed about these evolving threats and implementing robust security measures, organizations can significantly reduce their risk of falling victim to malicious packages.

Potential impact and mitigation strategies

An attack using a package like @maui-mf/app-auth could have severe consequences. Stolen AWS credentials could allow attackers to compromise resources within your AWS environment, potentially leading to data breaches or disrupted operations.

Here are some strategies to mitigate the risk of malicious packages:

Regularly review your project’s dependencies and their licenses.
Implement code signing to verify the authenticity of packages before installation.
Train your developers on software supply chain security risks and best practices for secure coding.

Next steps

Software supply chain security is a critical concern in today’s development landscape. By staying vigilant and implementing robust security measures, organizations can significantly reduce their risk of falling victim to malicious packages.

Learn more with the Mend.io open source risk research report.

Popular JavaScript Library ua-parser-js Compromised via Account Takeover

Maciej Mensfeld — Fri, 22 Oct 2021 19:57:18 +0000

A few hours ago, an npm package with more than 7 million weekly downloads was compromised. It appears an ATO (account takeover) occurred in which the author’s account was hijacked either due to a password leakage or a brute force attempt (GitHub discussion).

Three new versions of this package were released in an attempt to get users to download them. While the previous (clean) version was 0.7.28, the attacker published identical 0.7.29, 0.8.0, and 1.0.0 packages, each containing malicious code activated on install. The package’s author responded quickly by publishing 0.7.30, 0.8.1 and 1.0.1 in an attempt to minimize the number of people inadvertently installing a malicious package. This annotated screenshot of registry information shows that around 4 hours elapsed from attack to workaround:

Unfortunately, the malicious code was still available to download from npm for at least three more hours at the time of writing this post.

Most malicious packages being uploaded on a daily basis to npm attempt to steal environment keys in a generic way. These compromised versions, however, were targeting Windows and Linux + MacOS in a slightly different way. While both of the script versions were downloading and running cryptocurrency mining software, the Windows version also included a trojan component.

One reassuring thing is that though the cryptocurrency went unnoticed by the majority of the Windows antivirus software, the trojan component was detected and stopped by at least a dozen, including the most popular ones like Gdata and Symantec.

In the cases of Linux and MacOS, while we cannot at the moment eliminate the probability that it also included the trojan embedded in the cryptocurrency mining tool, our previous experience with this code indicates that it is not the case.

You can check the exploited versions source code changes between versions here:

You are responsible for your open source supply chain

This incident is just a tip of the iceberg of incidents occurring in the npm ecosystem. For the past month at Mend, we have identified and reported more than 350 unique packages with one or more malicious versions either taken over via ATOs or crafted for the sole purpose of causing various types of harm to end users.

Tips on how to prevent supply chain attacks:

Never use the same password for multiple websites.
If you are a packages maintainer, always enable 2FA.
Protect your supply chain with Mend Supply Chain Defender.
Follow the OSSF recommendations to pin dependencies and use a tool like Mend Renovate for automated dependency management.

What is ua-parser-js?

The affected library – ua-parser-js – is a “JavaScript library to detect Browser, Engine, OS, CPU, and Device type/model from User-Agent data,” i.e., based on the browser used. The library is used by over 1000+ other packages on npmjs, some directly but many indirectly, including some popular ones by the apollographql project and Facebook’s docusaurus.