How to Setup an AWS Server and NVIDIA Xavier NX2 for Deep Learning?
Requirements:
- AWS Server: Amazon EC2(G3 Tesla M60)
- Client: Ubuntu 18.04.5 LTS(Optional)
- Board: NVIDIA Xavier NX2
Table of Contents:
How to Login to your AWS instance through ssh?
You would require an id_rsa and id_rsa.pub files created for the server, now navigate to the folder containing these files in a terminal and enter
ssh -i id_rsa your_username@IP_Address_of_AWS_instance
You might have to set up a password for your first-time login, and use it to log in later whenever required. Make sure your username has “sudo” access!
Initial set up for AWS server:
- CUDA version 10.1
- Tensorflow Version 2.2.0rc2
After logging into the AWS server enter the following in the terminal to bring the files up-to-date
sudo apt-get update
sudo apt-get upgrade
NVIDIA Driver Installation:
Now download the NVIDIA driver in your client browser from the official site, and choose your preference:
- Product Type: Data Center / Tesla
- Product Series: M-Class
- Product: M60
- Operating System: Linux 64-bit Ubuntu 16.04
- CUDA Toolkit: 10.1
- Language: English (US)
- Recommended/Beta: All
After downloading the NVIDIA driver version 418.*, send the file from client to server using the “scp” command:
scp -i id_rsa /NVIDIA-Linux-x86_64*.run your_username@IP_Address_of_AWS_instance:~/
It will take a while for the file to be sent to the server, then you can install and restart the server by using the commands:
sudo /bin/sh ./NVIDIA-Linux-x86_64*.run
sudo shutdown -r now
Your connection will be closed and you will be able to connect again after a while! Don’t panic. Enter the following command after logging into the server:
nvidia-smi
If the installation was successful you should be able to see something like this but different versions of course:
Congrats you are one step closer to the apocalypse!
CUDA and cuDNN installation:
Download Cuda(version 10.1) and cuDNN(Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1) files on the client system and send them to the AWS server through the “SCP” command:
scp -i id_rsa cuda_10.1.105_418.39_linux.run your_username@IP_Address_of_AWS_instance:~/scp -i id_rsa cudnn-10.1-linux-x64-v7.6.5.32.tgz your_username@IP_Address_of_AWS_instance:~/
Note that the version you are using might be different, edit the above command accordingly!
Now login to the AWS server and install CUDA after making it an executable file:
chmod +x cuda_10.1.105_418.39_linux.run
sudo ./cuda_10.1.105_418.39_linux.run
Now extract the cuDNN file and copy it into the CUDA installed directory:
tar -xzf cudnn-10.1-linux-x64-v7.6.5.32.tgz
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.1/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda-10.1/include/
For the system to recognize the CUDA installation we will have to add the location of the installation directories manually by editing the “bash.bashrc” file:
sudo nano /etc/bash.bashrc
This would open the file in a “nano text editor”, add the following lines to the end of the file:
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}$
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}$
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}$export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}$
Again make sure the version of your CUDA is the same as what you enter here. Save and exit the editor by using Ctrl+X and entering “Y” to accept the changes. Now to check if the installation was a success enter the following”
source /etc/bash.bashrc
sudo shutdown -r now
nvcc -V
This should now give an output similar to the following With CUDA version 10.1:
Tensorflow GPU installation:
sudo apt-get install python3-pip
sudo pip3 install -U pip
pip3 install tensorflow-gpu==2.2.0rc2
This should take a while, if everything has been set up properly you should be able to import TensorFlow and run the following in python:
python3import tensorflow as tf
tf.test.is_gpu_available()
If this returns “true”, NVIDIA drivers, CUDA, and Tensorflow with GPU were a success. Congrats you have saved yourself 2 weeks' worth of headache(Reference)!
Initial set up for Nvidia Xavier NX2:
- Jetpack version 4.4
- Cuda version 10.2
- Tensorflow version 2.2
Initial Setup:
Download the JetPack SDK 4.4 for “JETSON XAVIER NX DEVELOPER KIT”, and write it to your SD card by following the steps in the NVIDIA Forum. The Jetpack already has CUDA and NVIDIA Drivers along with it. I had a LAN connection, monitor, keyboard, and mouse for the development board.
Important: If you are using an SD card that has more than 16 GB on the first boot follow the steps below to utilize the full SD card
oem-config
The default root disk partition is 14GiB. One approach to make it bigger is:
In the SDK manager “Linux_for_Tegra” subdirectory, before using SDK manager, edit “jetson-xavier-nx-devkit.conf” and add the line:
ROOTFSSIZE=30GiB
Then, on the NVIDIA Xavier NX2 Development Board:
sudo resize2fs /dev/mmcblk0p1
Then run the following commands:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get autoremove
sudo apt-get autoclean
Reboot the device, and enter the following in the terminal:
nvcc -V
If it doesn’t work, add the CUDA directory to the “.bashrc” file after installing “nano”:
sudo apt-get install nano
sudo nano ~/.bashrc
Add the following lines to the end of the file:
export PATH=$PATH:/usr/local/cuda-10.2/bin
export CUDADIR=/usr/local/cuda-10.2
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64
Save and exit by pressing Ctrl+x and entering “Y” for the prompt.
Tensorflow GPU installation:
sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortransudo apt-get install python3-pip
sudo pip3 install -U pipsudo pip3 install -U pip testresources setuptools numpy==1.16.1 future==0.17.1 mock==3.0.5 keras_preprocessing==1.0.5 keras_applications==1.0.8 gast==0.2.2 futures protobuf pybind11 h5py==2.9.0sudo pip3 install — pre — extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v44 tensorflow==2.2.0+nv20.7
This will take a while! If everything has been set up properly you should be able to import TensorFlow and run the following in python:
python3import tensorflow as tf
tf.test.is_gpu_available()
If this returns “true”, NVIDIA drivers, CUDA, and Tensorflow with GPU were a success. Congrats again you have saved yourself another 2 weeks’ worth of headache(Reference)!