Documentation

SROSP Framework

Next generation framework for social robot developemnet


  • Created: 15 September, 2023
  • Update: 20 September, 2023

If you have any questions that are beyond the scope of this help file or any suggestions, Please feel free to email via Email.


Features

The overall system architecture consists of four main layers:

  • ROS Layer
  • Web Server Layer
  • Android Layer
  • Hardware Layer

The schematic of the system architechture is as shown in the figure below.

Django Web Application

Django web application is responsible for communicating between the user interface and the database or the API endpoints.

It also serves the static files (css, Js, img, etc.) and the html templates.

The static files are where the user interface communicates directry with ROS server through a library called rosbridge js.

Installation

If you are here from the locally deployed version of SROSP, please skip to the Deploy section for running the app.

Follow the steps below to install the project:

    1. Install Python 3.7 or higher
    2. Install pip
    3. Install the following packages:
      • django 2.2.6 or higher
      • django-rest-framework
      • rosbridge
      • websockets
    4. Clone the repository and go to OSSRP/cleancoded/webInterface/interface_backend
    5. Run the following commands:
      • python manage.py makemigrations
        python manage.py migrate
        
      • To run the web server run:
      • python manage.py runserver
    6. Go to http://localhost:8000/index.html to access the robot control user interface. You will be prompted to login as you choose the robot panel. If there is none, click on the "add" button, fill the form that opens and save it. Rename your robot's GUI's HTML file as you named it in the form you just filled and copy it to /OSSRP/cleancoded/webInterface/interface_backend/core/templates and the static files in /core/statics. Now refresh the page.

    File Structure

    1. interface_backend/ contains the main project
    2. core/ contains the main app -> handles the main user interface and between-device communication (a.k.a. the android app)
    3. soundHandler/ contains the soundHandler app -> handles sound related models
    4. serialHandler/ contains the serialHandler app -> handles serial communication related models
    5. static/ contains the static files (css, js, images, etc.)
    6. templates/ contains the html templates
    7. db.sqlite3 is the database file
    8. manage.py is the main file for running the server
    9. requirements.txt contains the required packages for the project

Apps & API Endpoints

1. core/views.py contains the main API endpoints

2. soundHandler/views.py contains the API endpoints for the soundHandler app

3. serialHandler/views.py contains the API endpoints for the serialHandler app

Models

  1. core/models.py contains the main models:
    • EmotionModel
      • face -> name of the emotion
      • face_video_url -> url of the video for the emotion
      • video_file -> file of the video for the emotion
      • sound -> a foreign key field to that of the Song model in soundHandler app (the sound to play with the video simultaneously)
      • interface_button_emoji -> emoji for the emotion (used in the user interface)
    • Motor Control models:
      • dynatype -> type of the dynamixel motor
      • movement -> True if the motor must move, False otherwise
      • dir -> direction of the movement (0 for (up - right) or 1 for (down - left))
      • pos_up -> Neck pitch turn
      • pos_right -> Neck yaw turn
      • right_hand -> Right hand final position
      • left_hand -> Left hand final position
      • speed -> Speed of the movement
      • theta -> Angle of the movement
      • yaw -> Yaw of the movement
    • soundHandler/models.py contains the models for the soundHandler app
    • Song:
      • title -> title of the sound file
      • description -> description of the sound file
      • audio_file -> the sound file (optional)
      • audio_link -> the link to the sound file (optional)
      • duration -> the duration of the sound file

Serializers

All of the following are EmotionModel serializers:

  • EmotionModelSerializer
  • HooshangDynaSerializerHead
  • HooshangDynaSerializerHands
  • HooshangDynaSerializer

URLs

  1. Go to http://localhost:8000/index.html to access the robot control user interface
  2. Go to http://localhost:8000/admin to access the admin interface
  3. Go to http://localhost:8000/wizard/setup to access the setup wizard
  4. The following are the API endpoints:
    • /reqpub -> GET: loads the main user interface, POST: recieves movement commands and sends them to the robot
    • /reqcli -> GET: passes the user-selected emotion's data to the robot , POST: recieves the robot's response
    • /reqemo -> GET: sends back the video url of the selected sound, or the sound url of the selected face (video)
    • /reqip -> GET: sends back the IP address of the robot (fetchs server local ip address)

Deploy

You need a web server to run the project. I used nginx and gunicorn. You can use apache or any other web server.

To use the built-in django web-server, run the following command: python manage.py runserver. However, the server must have automaticly been served by the web server (e.g. nginx) in production environment (e.g. Jetson Nano) and gunicorn service & socket in the background on each reboot.

If not, check nginx status to make sure it is active and then check for errors in the log file: /var/log/nginx/error.log and /var/log/nginx/access.log. If you are using a different web server, check its log file.

Also this could have happened because of failed gunicorn service. Check the status of gunicorn service with the following command:

sudo systemctl status gunicorn
and check gunicorn configuration file: /etc/systemd/system/gunicorn.service and gunicorn.socket configuration file: /etc/systemd/system/gunicorn.socket.

NOTE *Make sure you have allowed the ports in use (e.g. 1935, 5353, 8080, etc.) in the firewall. To check the status of the firewall, run the following command:

sudo ufw status
. To allow a port, run the following command:
sudo ufw allow [port number]
. To allow a range of ports, run the following command:
sudo ufw allow [port number]:[port number]

Installation

  1. Install nginx
  2. Install gunicorn
  3. Create a new user for the project
  4. Create a new directory for the project
  5. Clone the project into the new directory
  6. Create a new virtual environment for the project
  7. Install the requirements for the project
  8. Create a new configuration file for the project in /etc/nginx/sites-available/
  9. Create a new configuration file for the project in /etc/supervisor/conf.d/
  10. Create a new configuration file for the gunicorn service in /etc/systemd/system/
  11. Setup the gunicorn service
  12. Setup nginx

14. Use the following commands for Gunicorn:

    sudo systemctl start gunicorn
    sudo systemctl enable gunicorn
    sudo systemctl status gunicorn
    sudo systemctl stop gunicorn
    sudo systemctl restart gunicorn
    sudo systemctl daemon-reload
    sudo systemctl reload gunicorn
    sudo systemctl disable gunicorn

15. Use the following commands for nginx:

    sudo systemctl daemon-reload
    sudo systemctl restart nginx
    sudo systemctl status nginx
    sudo systemctl stop nginx
    sudo systemctl start nginx
    sudo systemctl reload nginx
    sudo systemctl disable nginx
    sudo systemctl enable nginx

Notes

*If you want to be able to access the user-interface from other devices in the same network, you should use the same url but replace the IP address with the IP address of the main server (e.g. Jetson). You can also use a domain name instead of the IP address.*

*Add the following line to nginx sites-enabled/[my website's name] configuration file because your media files will be ignored by the browser if they are served with the default mimetype of text/plain : include /etc/nginx/mime.types;*

*If requirements.txt does not exist, run pipreqs [path/to/ project] to generate requirements.txt file for your project and install the requirements with pip install -r requirements.txt*

*The user-interface must be accessible from browser with the following address:http://[Jetson|Computer's IP address]:[port number(default:5353)]/index.html*

Additions to the project

If you want to assign a static IP address to the Jetson Nano, you can use the following command:

sudo nmtui

To check the IP address of the Jetson Nano, run the following command:

hostname -I

To check the IP address of the computer, run the following command:

ipconfig

If you want to deploy the project or assign a purchased domain name to the project, you must change the server_name in the nginx configuration file from localhost to the domain name. Be sure to configure the DNS settings of the domain name to point to the IP address of the server using cloudflare or any other DNS service provider.

Daemons

To run the project from scratch, run the following command:

gunicorn --bind host:port --workers 3 --threads 2 --timeout 120 --log-level debug --log-file log.txt --access-logfile access_log.txt --error-logfile error_log.txt --capture-output --enable-stdio-inheritance --daemon --pid pid.txt --pythonpath [path/to/project] [project_name].wsgi:application

DataBase

type of the database: sqlite3

location of the database: [path/to/project]/db.sqlite3

To access the database, you need to be a superuser. To create a superuser, run the following command:

python manage.py createsuperuser

Now you can access the database at /admin with admin privileges. (e.g. modify model objects, add new users, etc.)

Android

The Android device is currently only responsible for acting as the robot's face and sound player. In other terms, it acts as a simple multimedia player which is also in constant communication with the ROS and Django servers.

However, based on the modular structure of the whole system, it has been a vision for future developements to integrate the camera and the microphone data input of the Android device as well.

In that case, ROSJava would be needed to be implemented in the android app. It would be in charge of publishing the camera and microphone audio data inputs on the respected topics (which already exist).

  • Language: Kotlin

Download APK and Source Code

You can download the latest version of the app from here.

You can also access the source code of the app from here.

Supported Systems

*This app can be a little tricky if using Android=<4.4.0 because it does not support some of the newer streaming protocols. Some media might fail to play.

User Interface

You must first fill in the url with the [IP address:port number]of the main server (e.g. Jetson). No need for the /reqcli part.

*NOTE: http:// is required.

Then press the Check button on the bottom. Make sure the status is OK before going into full screen mode.

Note that if the status is OK but you see no URL provided, there is no problem and the client probably hasn't requested any changes yet. Move on.

When the screen goes into full screen mode, there's a small lock icon on the top right corner of the screen. If you wish to lock the medial player controls to prevent any sudden exits or block accidental touches, click on the lock icon.

You can always release the controls by touching the lock icon again.

Multimedia

The app's internal media player is Exo Player.

If you want to stream a video or audio file from the server, you must first put the file in the media folder of the project and then use the following url to access it: http://[IP address:port number]/media/[file name].[file extension]

Additionaly, you can online stream a video or audio by sending the url of the stream server to the androind app, just like you would do with a local file. (It is recommended to use a local server for streaming though for faster response time and less bandwidth usage.)

The app is capable of playing any video or audio file that is supported by the android device (e.g. mp4, mp3, wav, etc.) and any video or audio stream that is supported by the android device (e.g. rtsp, rtmp, etc.)

It caches the played media at the first time to save time and bandwidth for the next of the session.

Web Server

When the start button is clicked for the first time, the android app starts listening on [user-given url:port number]/reqcli (in this case: ip:5353/reqcli) for requests from the user-interface.

It then waites for an update in the json it recieves as a GET request and then changes the media output accordingly.

It looks for any changes in "audio_url" and "video_url" fields of the json and plays each media through a seprarate channel at the same time; Meaning that the user is not limited to playing video's audio only and can play any audio file with any video file.

After each action, the android app sends a POST request with the body being the status of the action (e.g. "Error!", "Video played successfuly", etc.) All of which are then recieved by the user-interface(Back-end => Django: Views.py) and displayed to the user(front-end => index.html).


ROS

Installation

Follow the steps below to install the project:

    1. Install ROS Melodic or higher
    2. Install the following packages:
      • numpy==1.25.1
      • matplotlib==3.7.2
      • scipy==1.7.3
      • pandas==2.0.3
      • tensorflow==2.6.0
      • torch
      • openpyxl==3.1.2
      • Pillow==10.0.0
      • pyaudio==0.2.13
      • opencv-python
      • tabulate==0.9.0
      • Image==1.5.33
      • mediapipe==0.10.2
      • deepface==0.0.79
      • glob2==0.7
      • pyax12
      • argparse
      • transformers
      • torch
      • hazm
      • deep_translator
      • edge_tts
      • audio_common_msgs
    3. Clone the repository and go to OSSRP/cleancoded/
    4. Run the following commands:
      • catkin_make
                        source devel/setup.bash
      • Now to run the ROS nodes, run the following command: roslaunch infrastructure init_robot.launch This will initiate all the available nodes. You can also initiate other node combinations separately by running the following commands:
        • roslaunch infrastructure initiate_facial.launch
        • roslaunch infrastructure gaze_pose.launch
        • roslaunch infrastructure speech_emotion_analysis.launch
        • roslaunch infrastructure speech_to_text.launch
        In case of any errors, check the log file: ~/.ros/log/latest/ for the error message.
        If you would like to run a node separately, run the following command: rosrun infrastructure [node_name]

Nodes

  • List of available ROS nodes:
    • opencv_client -> converts the Image messages to OpenCV messages and publishes them in the form of several list of lists (List.msg), since ROS doesn't support 3D Array messages
    • audio_recorder -> captures the mic's input via PyAudio library and broadcasts it as Audio_common_msgs
    • speech_emotion_analysis_server -> uses pre-trained audio features for sentiment analysis to analyze the speech's emotion, then publishes the results in custom messages(EmoProbArr.msg)
    • audio_feature_extractor -> uses PRAAT library to extract 10 features of the audio, then publishes the result as custom messages(AudFeatures.msg)
    • FaceEmotionAnalysis -> uses DeepFace library to analyze the face emotions and their probablities, then publishes the results as custom messages(FaceEmotions.msg).
    • landmark_detection -> uses Google's MediaPipe library to extract full-body landmarks, the publishes the results as custom messages(Landmarks.msg)
    • gaze_detector -> uses the face/head landmarks to estimate the head and gaze position. Then returns the result as a single String message.
    • gaze_pose_node -> calls the gaze_detector service and publishes its responses
    • speech_to_text_server -> takes in the audio data as Audio_common_msgs and returns the transcript as a single String.
    • text_to_speech_server -> Takes a text as a single String message and returns spoken data as Audio_common_msgs
  • speech_to_text_node -> calls the speech_to_text_server service and publishes its responses
  • List of developed ROS launch files:
    • init_robot.launch -> initiates all the available nodes
    • initiate_facial.launch -> initiates all the face/head-related nodes
    • gaze_pose.launch -> initiates the gaze_detector service and the gaze_pose_node
    • speech_emotion_analysis.launch -> initiates the audio_recorder and speech_emotion_analysis_server
    • speech_to_text.launch -> initiates the speech_to_text_server and speech_to_text_node

Messages

  • List of developed ROS msg files:
    • AudFeatures.msg -> 10 float64 items as 10 audio features
    • EmoProbArr.msg -> emotion's name as String and the probablity of it as float32
    • Array3D.msg -> a list of float64s
    • List.msg -> a list of Array3Ds
    • Landmarks.msg -> 4 lists of geometry_msgs/Points as face, right and left hand, and the pose landmarks
    • FaceEmotions.msg -> a list of EmoProbArrs

Services

  • List of developed ROS Services:
    • audio_features -> same as audio_feature_extractor node
    • speech_emotion_analysis -> same as speech_emotion_analysis_server node
    • speech_to_text -> same as speech_to_text_server node
    • gaze_pose -> same as gaze_detector node
    • text_to_speech -> same as text_to_speech_server node
  • List of developed ROS srv files:
    • AudFeature.srv -> takes Audio_common_msgs and returns AudFeatures
    • EmoProb.srv -> takes Audio_common_msgs and returns EmoProbArr
    • Gaze.srv -> takes List.msg (as the camera frame) and code>Landmarks, and returns a String
    • Stt.srv -> takes Audio_common_msgs and returns a String
    • Tts.srv -> takes a String and returns Audio_common_msgs

Topics

  • List of developed ROS Topics:
  • In the following section each message type in the brackets with a ".msg" and also without a package name is pointing to one of our custom messages described carefully in the list of developed msg files.

    • Image Processing

      • /image_raw -> raw image from camera [Sensor_msgs/Image]
      • /camera_info -> width, height, distortion, etc. [Sensor_msgs/CameraInfo]
      • /image_cv2 -> image converted to cv2 [List.msg]
      • /image_raw/landmarked -> image with landmarks in ROS Image format [Sensor_msgs/Image]
      • /image_cv2/landmarked -> image with landmarks in cv2 format [List.msg]
      • /gaze_pose -> gaze pose in String format [Std_msgs/String]
    • Audio Processing

      • /audio_features -> audio features in custom msg format [AudFeatures.msg]
      • /speech_emotion_analysis -> emotion probabilities in custom msg format called [EmoProbArr.msg]
      • /captured_audio -> audio captured from microphone in ROS AudioData format [Audio_common_msgs]
      • /transcript -> transcript of speech in string format [Std_msgs/String]

Lorem Ipsum is simply dummy text

Inline Text elements

You can use the mark tag to highlight text.

This line of text is meant to be treated as deleted text.

This line of text is meant to be treated as no longer accurate.

This line of text is meant to be treated as an addition to the document.

This line of text will render as underlined

This line of text is meant to be treated as fine print.

This line rendered as italicized text.


Embedded Video

Wrap any embed like an <iframe> in a parent element with .embed-responsive and an aspect ratio. The .embed-responsive-item isn’t strictly required, but we encourage it.

<div class="embed-responsive embed-responsive-16by9">
  <iframe class="embed-responsive-item" src="https://www.youtube.com/embed/7e90gBu4pas" allowfullscreen></iframe>
</div>

NotePlease go to official bootstrap documentation for a full information of embed video: Bootstrap Documentation

Popup with Video

Show Youtube and Vimeo video popup when click on link:

Open Popup YouTube Video

<a class="popup-youtube" href="http://www.youtube.com/watch?v=7e90gBu4pas">Open Popup YouTube Video </a>

FAQ

A FAQ is a list of frequently asked questions (FAQs) and answers on a particular topic.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.