How to deploy Tensorflow Models on the Nao¶
Tensorflow inference with python¶
The easiest method is just to use python similar to the way we train models.
Compile Tensorflow for Nao
Since newer tensorflow versions can't run out of the box on the Nao you have to compile it yourself. The guide is adapted from the official docs. For faster compilation you can compile it on another machine with the same architecture and extension set. It's important that the machine does not have AVX enabled.
export PATH="$HOME/bin:$PATH"
wget https://github.com/bazelbuild/bazelisk/releases/download/v1.11.0/bazelisk-linux-amd64
mv bazelisk-linux-amd64 $HOME/bin/bazel
chmod +x $HOME/bin/bazel
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
cd tensorflow_src
git checkout v2.6.3
./configure
bazel build --local_ram_resources=512 --local_cpu_resources=*.05 --config=opt //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/
Install pip package with:
TMPDIR=/home/nao/projects/cachedir pip install --cache-dir=/home/nao/projects/cachedir --build /home/nao/projects/cachedir tensorflow-2.8.0-cp36-cp36m-linux_x86_64.whl
Those folders must be explicitely set, otherwise you might run into problems with not enough space in the /tmp/ folder.
If there is any weird problems with bazel during compilation run: bazel clean --expunge
and then do the configure
step and package building again.
Example inference code and the compiled pip package can be found at ... (TODO: expose the lib folder to the public)
Tensorflow-Lite on Nao with C++ (<= v2.6.3)¶
We assume you have a Nao v6 set up with ubuntu on the nao as described elsewhere (TODO: insert link) 2. build the tensorflow library directly on the nao robot:
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
cd tensorflow_src
git checkout 2.6.3
./tensorflow/lite/tools/make/download.sh
./tensorflow/lite/tools/make/build_lib.sh
Note
The only reason tensorflow 2.6.3 was choosen is that it can build with simple makefiles. We could not make the cmake stuff work. It was possible to build newer versions on the nao. However using the library always got compile and linker problems as soon as we did not use cmake and the same folder structure as the examples.
- Create a main.cpp which holds the code for the inference on the tflite model. The code is adapted from label image example from the official docs.
Example 1
Example main.cpp without quantization
#include <cstdio>
#include <iostream>
#include "tensorflow/lite/interpreter.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/model.h"
int main(int argc, char* argv[]) {
// Load model
std::unique_ptr<tflite::FlatBufferModel> model =
tflite::FlatBufferModel::BuildFromFile("dummy_model2.tflite");
tflite::ops::builtin::BuiltinOpResolver resolver;
tflite::InterpreterBuilder builder(*model, resolver);
std::unique_ptr<tflite::Interpreter> interpreter;
builder(&interpreter);
// Allocate tensor buffers.
interpreter->AllocateTensors()
// get the input dimensions we need for the loaded model
TfLiteIntArray* dims = interpreter->tensor(interpreter->inputs()[0])->dims;
int image_height = dims->data[1];
int image_width = dims->data[2];
int image_channels = dims->data[3];
std::cout << "wanted_height: " << image_height << std::endl;
std::cout << "wanted_width: " << image_width << std::endl;
std::cout << "wanted_channels: " << image_channels << std::endl;
// set all the pixel values of the input tensor to 1.0
int number_of_pixels = image_height * image_width * image_channels;
// TODO maybe use typed_input_tensor instead of typed_tensor
float* input = interpreter->typed_tensor<float>(0);
for (int i = 0; i < number_of_pixels; i++) {
input[i] = float(1.0);
}
// resize the input tensor to the required shape
std::vector<int> sizes = {1, 16, 16, 1};
interpreter->ResizeInputTensor(interpreter->inputs()[0], sizes);
// Run inference
interpreter->Invoke()
float* output = interpreter->typed_output_tensor<float>(0);
std::cout << output[0] << std::endl;
std::cout << output[1] << std::endl;
std::cout << output[2] << std::endl;
std::cout << output[3] << std::endl
return 0;
}
Tensorflow-Lite on Nao with C++ (v2.15.0)¶
wget https://github.com/tensorflow/tensorflow/archive/refs/tags/v2.15.0.zip
unzip v2.15.0.zip
mkdir tf_build && cd tf_build
sudo apt update
sudo apt install cmake git
cmake -DCMAKE_BUILD_TYPE=RELEASE -DTFLITE_ENABLE_MMAP=ON -DTFLITE_ENABLE_XNNPACK=OFF -DSYSTEM_FARMHASH=OFF -DTFLITE_ENABLE_GPU=OFF -DTFLITE_ENABLE_NNAPI=OFF -DTFLITE_ENABLE_RUY=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_C_FLAGS="-ffast-math -funsafe-math-optimizations -march=silvermont -mtune=silvermont" -DCMAKE_CXX_FLAGS="-ffast-math -funsafe-math-optimizations -march=silvermont -mtune=silvermont" -DCMAKE_BUILD_RPATH="/home/nao/lib;." ../tensorflow-2.15.0/tensorflow/lite
cmake --build . -j3
These libs need to be copied to the toolchain_nao_ubuntu\extern\lib
folder of the toolchain repositories.
libabsl_strings.so libabsl_strings_internal.so libabsl_base.so libabsl_symbolize.so libpthreadpool.so
libabsl_city.so libabsl_synchronization.so libabsl_debugging_internal.so libabsl_time.so libabsl_demangle_internal.so
libabsl_time_zone.so libabsl_hash.so libabsl_int128.so libtensorflow-lite.so libabsl_low_level_hash.so
libabsl_malloc_internal.so libcpuinfo.so libabsl_raw_hash_set.so libabsl_raw_logging_internal.so libfarmhash.so
libabsl_spinlock_wait.so libfft2d_fftsg.so libabsl_stacktrace.so libfft2d_fftsg2d.so
To make it easy we can just grep all the header files from the tensorflow lite directory like this
rsync -am --include='*.h' -f 'hide,! */' ../tensorflow-2.15.0/tensorflow/lite/ <my output path>/tensorflow/lite/
toolchain_nao_ubuntu\extern\include
Tensorflow-Lite on Nao C API (v2.15.0)¶
wget https://github.com/tensorflow/tensorflow/archive/refs/tags/v2.15.0.zip
unzip v2.15.0.zip
mkdir tf_build && cd tf_build
sudo apt update
sudo apt install cmake git
cmake -DCMAKE_BUILD_TYPE=RELEASE -DTFLITE_ENABLE_MMAP=ON -DTFLITE_ENABLE_XNNPACK=OFF -DSYSTEM_FARMHASH=OFF -DTFLITE_ENABLE_GPU=OFF -DTFLITE_ENABLE_NNAPI=OFF -DTFLITE_ENABLE_RUY=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_C_FLAGS="-ffast-math -funsafe-math-optimizations -march=silvermont -mtune=silvermont" -DCMAKE_CXX_FLAGS="-ffast-math -funsafe-math-optimizations -march=silvermont -mtune=silvermont" -DCMAKE_BUILD_RPATH="/home/nao/lib;." ../tensorflow-2.15.0/tensorflow/lite/c
cmake --build . -j 2
toolchain_nao_ubuntu\extern\lib
folder of the toolchain repositories.
libtensorflowlite_c.so
libtensorflowlite_c.so
. That way we don't need to copy any other libs.
Compile like HTWK
HTWK do not disable all the extras and use XNNPack functionalities in their code. To do something similar we can adapt the CMAKE command like this
cmake -DCMAKE_C_FLAGS="-ffast-math -funsafe-math-optimizations -march=silvermont -mtune=silvermont" -DCMAKE_CXX_FLAGS="-ffast-math -funsafe-math-optimizations -march=silvermont -mtune=silvermont" -DCMAKE_BUILD_RPATH="/home/nao/lib;." ../tensorflow-2.15.0/tensorflow/lite/c
It might be interesting to compare the inference speed of our vanilla tflite implementation with the one HTWK is doing.
Tensorflow-Lite Micro¶
This was last tested with tensorflow v2.6.3 Make sure python3, pip and the pillow package is installed on the Nao
# this is needed during tflm compilation
apt install python3, python3-pip python-is-python3
python -m pip install pillow
git clone https://github.com/tensorflow/tflite-micro.git tflm_src
cd tflm_src
make -f tensorflow/lite/micro/tools/make/Makefile microlite
Create a tflite file¶
TODO explain exporter options
Convert a model for inference with tflite-micro¶
xxd -i cool_model.tflite > my_model.cc
unsigned char cool_model_tflite[] = {
0x18, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, 0x00, 0x00, 0x0e, 0x00,
// <Lines omitted>
};
unsigned int cool_model_tflite_len = 18200;
-i
in the xxd command tells it to output the hexdump in c include file style.
For creating cpp code for basic inference the hello world example from the tflite micro repo is good start. tflm
Example 1
Example main.cpp without quantization
#include <math.h>
#include <iostream>
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "my_model.cc"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/testing/micro_test.h"
#include "tensorflow/lite/schema/schema_generated.h"
int main(){
// Set up logging
tflite::MicroErrorReporter micro_error_reporter;
const tflite::Model* model = ::tflite::GetModel(cool_model_tflite);
// This pulls in all the operation implementations we need
tflite::AllOpsResolver resolver;
constexpr int kTensorArenaSize = 7 * 1024;
uint8_t tensor_arena[kTensorArenaSize];
// Build an interpreter to run the model with
tflite::MicroInterpreter interpreter(model, resolver, tensor_arena,
kTensorArenaSize, µ_error_reporter);
// Allocate memory from the tensor_arena for the model's tensors
interpreter.AllocateTensors();
// Obtain a pointer to the model's input tensor
TfLiteTensor* input = interpreter.input(0);
input->data.f[0] = float(2);
// Run the model and check that it succeeds
TfLiteStatus invoke_status = interpreter.Invoke();
// Obtain a pointer to the output tensor and make sure it has the
// properties we expect. It should be the same as the input tensor.
TfLiteTensor* output = interpreter.output(0);
float output = output->data.f[0];
std::cout << output << std::endl;
return 0;
}
Example 2
TODO: create an example with quantization
Those examples can be compiled with:
g++ main.cpp -std=c++11 \
-L<path to folder where libtensorflow-microlite.lib is> \
-I<path to tflm repo> \
-I<path to flatbuffers include folder> \
-ltensorflow-microlite \
-DTF_LITE_STATIC_MEMORY
CompiledNN¶
CompiledNN is a library from B-Human. You can compile it on the Nao itself with:
git clone https://github.com/bhuman/CompiledNN.git
cd CompiledNN
mkdir
cmake ..
make
make install
Example main.cpp
#include <CompiledNN/Model.h>
#include <CompiledNN/CompiledNN.h>
#include <iostream>
#include <chrono>
#include <random>
using namespace NeuralNetwork;
int main()
{
Model model;
model.load("dummy_model2.h5");
// Optionally, indicate which input tensors should be converted from unsigned chars to floats in the beginning.
// model.setInputUInt8(0);
CompiledNN nn;
nn.compile(model);
// ... fill nn.input(i) with data
std::vector<NeuralNetwork::TensorXf> testInputs(model.getInputs().size());
float minInput = -1.f, maxInput = 1.f;
std::mt19937 generator;
std::uniform_real_distribution<float> inputDistribution(minInput, maxInput);
const std::vector<NeuralNetwork::TensorLocation>& inputs = model.getInputs();
for(std::size_t i = 0; i < testInputs.size(); ++i)
{
testInputs[i].reshape(inputs[i].layer->nodes[inputs[i].nodeIndex].outputDimensions[inputs[i].tensorIndex]);
float* p = testInputs[i].data();
for(std::size_t n = testInputs[i].size(); n; --n)
*(p++) = inputDistribution(generator);
}
for(std::size_t i = 0; i < testInputs.size(); ++i)
nn.input(i).copyFrom(testInputs[i]);
auto invoke_start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000; i += 1) {
nn.apply();
}
auto invoke_end = std::chrono::high_resolution_clock::now();
// ... obtain the results from nn.output(i)
std::chrono::duration<double> invoke_latency = invoke_end - invoke_start;
std::cout << "Duration: " << std::chrono::duration_cast<std::chrono::milliseconds>(invoke_latency).count() << "ms" << std::endl;
std::cout << model.getInputs()[0].tensorIndex << std::endl;
std::cout << nn.numOfInputs() << std::endl;
std::cout << nn.numOfOutputs() << std::endl;
std::cout << nn.output(0).size() << std::endl;
std::cout << nn.output(0)[0] << std::endl;
return 0;
}
Example compile command
g++ main.cpp -std=c++14 \
-L/usr/lib/x86_64-linux-gnu/hdf5/serial \
-lCompiledNN -lprotobuf -lhdf5 -lrt -O3