AI applications tested for various tasks

Angela Ruth

Jul. 15, 2024
1 min read

An exploratory trial recently put four AI applications to the test in various tasks: creating press releases, brainstorming innovative ideas, pinpointing journalists for a project pitch, and analyzing data. These AI models for evaluation included OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Anthropic’s Claude.

The results were fascinating. ChatGPT shone in generating human-friendly text, press releases in particular, while Copilot excelled in churning out code snippets. This could revolutionize the programming world.

On the data analysis side, Google’s Gemini used machine learning to rapidly dissect and analyze vast amounts of data. Claude by Anthropic had us awestruck with its knack for suggesting innovative ideas using its complex algorithms.

However, when asked to recommend journalists for a project pitch, the AI applications observed were unimpressive. This task seemed to need a personal touch and understanding of journalist’s preferences and area of expertise – an aspect AI currently cannot replicate.

The AI applications exhibited impressive capabilities nonetheless, but it‘s clear that we have more ground to cover, especially when tasks require a human touch. Despite these insights, the trial underlined the potential and remarkable strides in AI technology.

Given no tailor-made training, the AI systems’ testing was reflective solely of their built-in capacities. The results provided useful insights regarding the efficacies and limitations of current AI models.

Evaluating AI applications in diverse tasks

With these valuable learnings in tow, future improvements and configurations can be made for better AI system performance.

Efficiency and dependability of AI tools were put to the test when assigned to compose a press release, giving an overview of company’s quarterly achievements, to creating a shareholders communication message or to simplifying meeting minutes. Microsoft’s Copilot came out victorious in these trials, although there were minor errors and scopes for improvement.

However, Google’s Gemini bots couldn’t match up in a few tasks. The press release it churned out was riddled with fillers and errors. It had a tendency of using irrelevant phrases, tarnishing its reputation of having high performing algorithms. Changes and updates are expected to rectify these issues.

Meanwhile, OpenAI’s ChatGPT and Stergios’ Claude displayed satisfactory but not exceptional results in crafting the press release task. Although they showed efficiency in generating data-heavy content, both could benefit from modifications for improved user experience.

In a task to come up with AI-related stories, the AI models lacked originality, offering predictable narratives. Their ability to mimic human creativity seems to be falling short. The AIs were also tested as ‘headhunters’, tasked to identify and suggest journalists for a media relations seminar, the results of which will be revealed later. This task will test not only their proficiency but also their strategic thinking and ability to spot novelty and relevance.

Reviewing these tests opens a window into understanding and improving AI abilities. The final report documenting the overall performance and findings from these tasks will provide insights and benchmarks for future AI development in such scenarios.