Connecting Robot Accessories to Any Industrial Network
An early warning system for novel AI risks
An early warning system for novel AI risks
New simulation tool helps robots handle fluids
Exploring new methods for increasing safety and reliability of autonomous vehicles
Fort Robotics – Avoid Costly Downtime with Safety & Security for Machines
Robot-based Automation: 3 Tips for a Time and Cost-efficient Implementation
What’s new about generative AI in a business context?
I’ve spent the last eight years working with AI, learning the ins and outs of building and applying AI solutions in business. After making countless mistakes, I created my own method for building and applying the technology.
That was fine and dandy until the fall of 2022, when ChatGPT was released and gave a sudden rise in the usefulness and adoption of generative AI. For my consulting business TodAI, that meant a lot of new projects involving generative AI and a lot of learning. After several projects, I’ve identified places where generative models are clearly distinct from other AI when applying them in a business. Some are small, and others are very significant.
How do these new generative AI models change the game for applied AI?
Terminology
It’s easier to discuss the changes if we distinguish between generative AI and predictive AI.
Generative AI refers to large pre-trained models that output texts, images or sounds from user-provided prompts. The output is (potentially) unique and mimics human-generated content. It’s based on the prompt and the data used to train a large pre-trained model. Text-generating models such as OpenAI's GPT or Googles Bard are also known as large language models (LLMs).
Predictive AI comprises models that output one or more labels (prediction or classification) or numbers (regression or time series). It includes:
● Image building blocks: image classification, object detection and image segmentation
● Tabular building blocks: prediction, regression and forecast
● Text building blocks: text classification, named entity recognition and Intent analysis
An alternative and more accurate name for predictive AI in an academic sense is discriminative AI. However, I use “predictive” as it might resonate better with most people.
Generative AI can also predict
Generative AI (such as GPT) also can be used to solve predictive problems. ChatGPT can learn to classify texts through a few examples (few-shot learning) or no examples at all (zero-shot learning). The functionality might be the same, but there’s a technical difference. Generative AI doesn’t need you to train an algorithm that produces a model that can then classify. Instead, the generative model gets the examples as a part of the prompt.
The upside of using the generative models for predictive tasks is that implementation can be done immediately. However, there are downsides, such as:
● no way to calculate the expected performance through (for example) accuracy measures
● the generative model might provide an output that isn’t a part of the list of provided labels
● each prompt for output might affect future output
● generative models tend to "forget" the initial examples as they have a limit for how many prompts they can remember
Knowing how well an AI solution works is more complicated and takes more time
A good rule of thumb is that if you can’t achieve an OK accuracy of your model within 24 hours of work, you either have the wrong data or the wrong scope.
For example, a model predicting housing prices that predicts with 50% accuracy after 24 hours of modelling work will never see more than 60% or 65% accuracy, no matter what clever algorithm or fine-tuning you apply. If 60% isn’t good enough for your business case, you need to acquire more, other or better data, or change your business scope.
Following the 24-hour rule means that AI solutions that will never work are spotted early and scraped or redefined. The 24-hour rule has saved me from countless embarrassing failures, and it works as accuracy is an excellent indicator (not equal to, though) for the business value you can expect.
But that rule is no longer helpful in generative AI as there are no accuracy measurements during development. For example, if your business case is generating sales emails for a group of sales reps, you can’t measure the "accuracy" of the output. The business outcomes you’re trying to achieve might be faster communication with clients (through writing speed) or more sales (through better emails). These outcomes are hard to measure during development. Writing speed in particular is challenging to measure, as the output has to be checked and edited by a sales rep, and testing that speed requires involving the sales rep.
Generative AI requires closer domain expert collaboration
The result of this challenge is that domain experts must be involved closely in the development process to help adjust the output and measure the effect on the business outcome you’re trying to achieve. The days when you could rely on data scientist training and fine-tuning alone until a satisfying solution is ready are over.
Picking use cases should be based on ease of testing
Delloite’s helpful guide about generative AI suggests that use cases for generative AI should be based on the effort it takes to validate output and the effort it would take a human to generate the same content.
Deloitte, Generative AI is all the Rage
The graph shows that the less effort it takes for a human to validate the output, the better the use case. Delloite works from a production point of view, but the same dynamic goes for testing the solution during development. If testing how a generative model's output affects the desired business outcome is too challenging, the use case might not be suitable.
So here's a new version of the 24-hour rule:
If you can’t test how your generative AI output affects the business outcome, stop the project or redefine the scope.
Generative AI benefits experts more than the novice users
Predictive AI use cases have often been about creating decision support systems where models trained on historical data could predict what the experts already know. For example, an AI solution that predicts what account an invoice should be recorded on can be trained on historical work from professional accountants and used to help small business owners do their accounting.
In generative AI, the models’ output is hard to access for novice users who won’t necessarily know if the AI is offering confident but wrong output. An example could be a solution that drafts a contract. Novice users with little to no legal experience won’t know if a clause is illegal or bears hidden risks. An experienced legal expert, on the other hand, will be able to weed out the confident-but-wrong output and pick up on helpful output, like a rare but relevant clause in a specific case.
This is a massive game-changer for AI. Not only does it mean that use cases for AI go towards another target audience (experts), it also suggests that value creation will be a more significant part of AI business cases than saving costs.
Implementing the decision model
A crucial part of productively applying predictive AI has been to make a decision model for mapping output, confidence and actions. For example, you could take a classic churn model that predicts how likely a customer is to cancel their agreement (churn) with a company. Mapping out the actions to take given a prediction (churn or not churn) is a crucial first step in building the solution.
In fact, no code or data acquisition should be done if the decision model can’t be made. The most common cause of AI solutions not going into production is that the business can’t agree on what actions to take given a prediction and an output.
Suppose the churn model predicts a customer has a 60% likelihood of churning in the next three months. How do you handle that? Do you provide a discount? Call the client? Simply accept it and use it as a forecast? As most businesses get stuck in strategic, legal or other discussions, AI solutions never surpass a prototype. Too bad if you invested a lot of time and money in development just to get stuck.
But it’s not as easy to make a decision model for a solution based on generative AI. First, the model output doesn’t go into simple labels, such as churn or no churn. Instead, it’s just a piece of text or an image. At the same time, there's no confidence score. The result is that you can only build a decision model with one output and one action.
On the bright side, the decision model or process flow will get simpler, although it also creates challenges. As you don’t know the quality of the output, and the output essentially can be anything, it’s tough to create fully automated systems. A confidence threshold is often set for predictive AI, and anything above that threshold triggers an automated action. For example, the above-mentioned accounting model could automatically act as a bookkeeper if confidence was above 98%.
One fix for confidence is when generative AI uses data from the source material for inspiration. That could be a database of contract templates or support articles. In this case, you can get a confidence score on the model's certainty that the source material was a good match for the prompted request.
More business, less code
Developing AI solutions based on generative AI requires less code than predictive AI solutions. Generative AI solutions don’t need training, complicated prediction pipelines, or retraining flows. The output is often presented as-is to the user and not post-processed as much as in predictive AI solutions. This requires less code, and the development time is shorter. That, in combination with the above-mentioned challenges to applying generative AI in business contexts, results in more business work and less code. It results in a shift towards a need for business roles such as UX, solution designers or product managers, and a reduced need for developers and data scientists. Both are still important, but in terms of hourly expenditure, the business side goes up and the technical side down.
Conclusion
In my experience, the implementation of generative AI is faster and more impactful than predictive AI, but it also requires more business understanding and courage from the business implementing it.
The added requirements for business understanding will be especially interesting to observe in future AI cases, as they’ve historically been under-prioritised by most teams delivering AI solutions.
Researchers build bee robot that can twist
AI-powered Perception Engine for Mobile Robots
Building a Tablebot
There was a shortage of entries in the tablebot competition shortly before the registration window closed for RoboGames 2023. To make sure the contest would be held, I entered a robot. Then I had to build one.
What’s a tablebot?
A tablebot lives on the table. There are three “phases” to the competition:
- Phase I: Build a robot that goes from one end of a table to the other and back.
- Phase II: Have the robot push a block off the ledge of the table.
- Phase III: Have the robot push the block into a shoebox mounted at the end of the table.
There is also an unofficial Phase IV – which is to fall off the table and survive. I did not attempt this phase.
The majority of tablebots are quite simple – a couple of sonar or IR sensors and they kind of wander around the tabletop in hopes of completing the different phases. My tablebot is decidedly different – and it paid off as the robot won the gold medal at RoboGames 2023.
Robot build
The entire robot is built of 3D printed parts and random things I had on hand.
I’ve had one of those $99 LD-06 lidars sitting around for a while, and decided this was a great project to use it on. I used a Dynamixel AX-12 servo to tilt the laser so I can find the table, the cube, or the goal.
All of the code runs on an STM32, on my custom Etherbotix board which was designed for my Maxwell robot a number of years ago. The robot uses differential drive with some 30:1 12V gear motors, which were purchased from Lynxmotion in 2008 and used in various fire fighting robots over the years.
A set of small digital Sharp IR sensors are used as cliff sensors. These can be moved up or down to calibrate for different table surfaces using a pair of adjustment screws. While the sensors are very accurate and stop the robot, they don’t see far enough ahead when going at full speed, and so I also use the laser to detect when the table edge is approaching.
Phase 1 Software
Phase 1 is pretty straight forward – and mostly based on dead reckoning odometry:
- The laser is angled downwards looking for the table. This is done by projecting to the scan to 3D points, and filtering out anything not in front of the robot at roughly table height. When the table disappears (number of points drops too low), we reduce our maximum speed to something that is safe for the cliff sensors to detect.
- While the laser sensors look for the end of the table, the robot drives forward, and a simple feedback loop keeps the robot centered on the table using odometry.
- When the cliff sensors eventually trigger, the robot stops, backs up 15 centimeters, and then turns 180 degrees – all using dead reckoning odometry.
- The maximum speed is then reset and we take off to the other end of the table with the same behavior.
Phase 2 Software
The movements of Phase 2 are basically the same as Phase 1 – we drive forward, staying centered with odometry. The speed is a bit lower than Phase 1 because the laser is also looking for the block:
- The laser scan is projected to 3D, and we filter out any points that are part of the table based on height. These remaining points are then clustered and the clusters are analyzed for size.
- If a cluster is a good candidate for the block, the robot turn towards the block (using, you guessed it, dead reckoning from odometry).
- The robot then drives towards the block using a simple control loop to keep the heading.
- Once the block is arrived at, the robot drives straight until a cliff sensor trips.
- At that point, the robot stops the wheel on the side of the tripped cliff sensor and drives the other wheel very slowly forward so that we align the front of the robot with the edge of the table – ensuring the block has been pushed off the table.
Phase 3 Software
The final phase is the most complex, but not by much. As with the earlier phases, the robot moves down the table finding the block:
- Unlike in Phase 2, the robot actually approaches a pose just behind the block.
- Once that pose has been reached, the robot tilts the laser back to level and finds the goal.
- The robot then turns towards the goal in the same way it first turned towards the block.
- The robot then approaches the goal using the same simple control loop, and in the process ends up pushing the block to the goal.
All of the software for my Tablebot is availble on GitHub.
Robogames video
Jim Dinunzio, a member of the Homebrew Robotics Club, took a video during the actual competition at Robogames so you can actually see the winning set of runs:
Visualization
To make development easier, I also wrote a Python GUI that renders the table, the robot odometry trail, the laser data, and detected goals and cubes.
Fun with math
Along the way I actually ran into a bug in the ARM CMSIS DSP library. I used the arm_sin_cos_f32()
function to compute my odometry:
arm_sin_cos_f32(system_state.pose_th * 57.2958f, &sin_th, &cos_th);
system_state.pose_x += cos_th * d;
system_state.pose_y += sin_th * d;
system_state.pose_th = angle_wrap(system_state.pose_th + dth);
This function takes the angle (in degrees!) and returns the sine and cosine of the angle using a lookup table and some interesting interpolation. With the visualization of the robot path, I noticed the robot odometry would occasionally jump to the side and backwards – which made no sense.
Further investigation showed that for very small negative angles, arm_sin_cos_f32
returned huge values. I dug deeper into the code and found that there are several different versions out there:
- The version from my older STM32 library, had this particular issue at very small negative numbers. The same bug was still present in the official CMSIS-DSP on the arm account.
- The version in the current STM32 library had a fix for this spot – but that fix then broke the function for an entire quadrant!
The issue turned out to be quite simple:
- The code uses a 512 element lookup table.
- For a given angle, it has to interpolate between the previous and next entry in the table.
- If your angle fell between the 511th entry and the next (which would be the 0th entry due to wrap around) then you used a random value in the next memory slot to interpolate between (and to compute the interpolation). At one point, this resulted in sin(-1/512) returning outrageous values of like 30.
With that bug fixed, odometry worked flawlessly afterwards. As it turned out, I had this same function/bug existing in some brushless motor control code at work.
Robogames wrap up
It is awesome that RoboGames is back! This little robot won’t be making another appearance, but I am starting to work on a RoboMagellan robot for next year.