LINK: PredicitonIO: Build and Deploy ML Applications in a Fraction of the Time

Multiple Events and Multiple Algorithms PredictionIO CLI Cheatsheet

# 1.Set up(on ubuntu)

# 1.1 Host Selection

Run on virtualbox

Method 1: Download Dpkg install Method 2:

Ref: [SOLVED] Setting up VirtualBox-5.1 and vboxconfig failing on Fedora24 sudo mokutil --disable-validation

Tips: Sudo apt-get remove virtualbox-{version} Disable secure boot

Try run HDP sandbox

Run on Docker

Docker images
Docker ps -a
Docker rm ‘’
Docker run -it ‘’ bash
Docker exec -it ‘’ bash
Docker login
Docker pull

# 1.2 Install PredictionIO

Tips: Refer to other Dockerfile Refer to other

# 1.2.1 try docker images

Failed because of low version, doesn’t match with the template (the version not latest)

docker run -it -p 8000:8000 steveny/predictionio /bin/bash
jps -l
Pio status
Apt-get install git
pip install -U setuptools

# 1.2.2 try local deployment through heroku

Failed because it charges during the process Requirements ● Heroku account ● Heroku CLI, command-line tools ● git

root@be0576bd8d4e:/home/workspace/engine-dir# wget -qO- | sh
root@be0576bd8d4e:/home/workspace/engine-dir/pio-engine-ur# heroku create $ENGINE_NAME
Creating app... !
 ▸	Invalid credentials provided.
Enter your Heroku credentials:
Email: [email protected]
Password: ********
Creating app... done, ⬢ obscure-wave-35511 |

# 1.2.3 Build from src Install java export JAVA_HOME=/usr/local/java/jdk1.8.0_151 export PATH=$JAVA_HOME/bin:$PATH Install postgresql How To Install and Use PostgreSQL on Ubuntu 16.04 #/etc/init.d/postgresql start Switch account: su postgres Postgres Psql

root@be0576bd8d4e:/home/workspace/postgresql# /etc/init.d/postgresql start
 * Starting PostgreSQL 9.6 database server                                                                                                                                                       	[ OK ]
root@be0576bd8d4e:/home/workspace/postgresql# ps -ef | grep post
postgres  6367 	0  0 08:35 ?    	00:00:00 /usr/lib/postgresql/9.6/bin/postgres -D /var/lib/postgresql/9.6/main -c config_file=/etc/postgresql/9.6/main/postgresql.conf
postgres  6369  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: checkpointer process   
postgres  6370  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: writer process   
postgres  6371  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: wal writer process   
postgres  6372  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: autovacuum launcher process   
postgres  6373  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: stats collector process   
root  	6387  6332  0 08:36 pts/1	00:00:00 grep --color=auto post
root@be0576bd8d4e:/home/workspace/postgresql# postgres
bash: postgres: command not found
root@be0576bd8d4e:/home/workspace/postgresql# su postgres
postgres@be0576bd8d4e:/home/workspace/postgresql$ psql install PredictionIO

  • Step 1 make distribution:
docker cp /home/lyhistory/Downloads/apache-predictionio-0.12.0-incubating.tar.gz.asc frosty_wescoff:/home/workspace

gpg --import KEYS
gpg --verify apache-predictionio-0.12.0-incubating.tar.gz.asc apache-predictionio-0.12.0-incubating.tar.gz

tar zxvf apache-predictionio-0.12.0-incubating.tar.gz -C ./apache-predictionio-0.12.0-incubating/

export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH

  • Step 2 Spark config: Spark-2.1.1-bin-hadoop2.6.tgz PredictionIO-0.12.0-incubating/conf/ and change the SPARK_HOME

  • Step 3 Storage:
  1. Postgresql 9.6 refer to 4.3.2

  2. Mysql

Docker pull mysql:5.7
docker run --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:tag --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
docker run --name mysql -e MYSQL_ROOT_PASSWORD=password -d mysql:5.7 --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

$PIO_HOME/lib/mysql-connector jar
Create pio db
  1. HBase and Elasticsearch Elasticsearch-5.5.2.tar.gz: PredictionIO-0.12.0-incubating/conf/ and change the PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME Hbase-1.2.6-bin.tar.gz: PredictionIO-0.12.0-incubating/conf/ and change the PIO_STORAGE_SOURCES_HBASE_HOME

Edit PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/conf/hbase-site.xml.


Edit PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/conf/ to set JAVA_HOME for the cluster. For example:

export JAVA_HOME=/usr/local/java/jdk1.8.0_151

root@be0576bd8d4e:/home/workspace/apache-predictionio-0.12.0-incubating# PredictionIO-0.12.0-incubating/bin/pio status
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.0-incubating is installed at /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [HBLEvents] The namespace pio_event doesn't exist yet. Creating now...
[INFO] [HBLEvents] The table pio_event:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table pio_event:events_0...
[INFO] [Management$] Your system is all ready to go.

# 1.2.4 Use existing pio image

Step 1: Set static ip

apt-get install openssh-server

Configure Node Networking
cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static

Step 2: Setup docker mysql:5.7

Docker pull mysql:5.7

sudo docker run -d --name mysql_dev \
-p 3306:3306 \
-d mysql:5.7

sudo docker exec -ti mysql_dev mysql -uroot -ppassword

create database pio DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
CREATE USER 'pio'@'localhost' IDENTIFIED BY 'pio';
GRANT ALL ON *.* TO 'pio'@'localhost';
GRANT ALL ON *.* TO 'pio'@'%';
flush privileges;


Reset mysql container:
docker rm -f mysql_dev

Step 3: Setup docker predictionio:0.12.0

  1. Load predictionio image:
scp [email protected]:/root/*.xz ./

pxz -cd ./pio-0.12.0.tar.xz | sudo docker load
  1. Download sample template - MyRecommendation: git clone MyRecommendation

  2. Config vendor:

scp [email protected]:/root/*.gz ./

Step 4: Run and config PIO

  1. conf/
sudo docker run -ti -p 7080:7070 -p 8110:8000 \
-v $(readlink -e ~/PIO/vendors):/PredictionIO-0.12.0-incubating/vendors \
-v $(readlink -e ~/MyRecommendation):/MyRecommendation \
--link mysql_dev \
--name pio_$(whoami) \
pio:0.12.0 bash

Then check linked folders: MyRecommendation and vendors Try ping mysql_dev container servicep

Config pio:



PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://mysql_dev:3306/pio?autoReconnect=true (here we use the cointainer name instead of ip, if you want to use ip, you can find it by docker inspect mysql_dev)
  1. start pio and check status

  2. new app and import data

pio app new MyApp1

curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "rate",
  "entityType" : "user",
  "entityId" : "u0",
  "targetEntityType" : "item",
  "targetEntityId" : "i0",
  "properties" : {
    "rating" : 5
  "eventTime" : "2014-11-02T09:39:45.618-08:00"

curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "buy",
  "entityType" : "user",
  "entityId" : "u1",
  "targetEntityType" : "item",
  "targetEntityId" : "i2",
  "eventTime" : "2014-11-10T12:34:56.123-08:00"

curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"

curl --create-dirs -o data/sample_movielens_data.txt
python data/ --access_key $ACCESS_KEY

Step 5: build and deploy

Pio build --verbose
Pio train
Pio deploy

# 2. Quick start predictionio

PATH=$PATH:/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/bin; export PATH



root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# pio app new MyApp1
[INFO] [HBLEvents] The table pio_event:events_1 doesn't exist yet. Creating now...
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [Pio$] Created a new app:
[INFO] [Pio$]   	Name: MyApp1
[INFO] [Pio$]     	ID: 1
[INFO] [Pio$] Access Key: UyQifiuvbOYcOJJArkNZZ6HJYQoD-FhiO22Bvk19zsy7RLo4EuLUkEe_PWPNNz5N


root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
> -H "Content-Type: application/json" \
> -d '{
>   "event" : "rate",
>   "entityType" : "user",
>   "entityId" : "u0",
>   "targetEntityType" : "item",
>   "targetEntityId" : "i0",
>   "properties" : {
> 	"rating" : 5
>   }
>   "eventTime" : "2014-11-02T09:39:45.618-08:00"
> }'
HTTP/1.1 201 Created
Server: spray-can/1.3.3
Date: Sun, 12 Nov 2017 13:34:55 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 57

{"eventId":"illrLcpg1dDE2bvZZ1NpggAAAUlxl11SrFjDLEi9C6A"}root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"
HTTP/1.1 200 OK
Server: spray-can/1.3.3
Date: Sun, 12 Nov 2017 13:35:35 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 270

root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
> -H "Content-Type: application/json" \
> -d '{
>   "event" : "buy",
>   "entityType" : "user",
>   "entityId" : "u1",
>   "targetEntityType" : "item",
>   "targetEntityId" : "i2",
>   "eventTime" : "2014-11-10T12:34:56.123-08:00"
> }'
HTTP/1.1 201 Created
Server: spray-can/1.3.3
Date: Sun, 12 Nov 2017 13:36:09 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 57

{"eventId":"Z0813DMQIKz7N4VGxZhmngAAAUmbap37v3QoMu7STuI"}root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY -H "Content-root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"
HTTP/1.1 200 OK
Server: spray-can/1.3.3
Date: Sun, 12 Nov 2017 13:36:15 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 528


interact with python sdk

Install python
apt-get install -y python-pip
pip install --upgrade pip
pip install -U setuptools
Install python sdk
pip install predictionio

Import data:
root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl --create-dirs -o data/sample_movielens_data.txt
  % Total	% Received % Xferd  Average Speed   Time	Time 	Time  Current
                             	Dload  Upload   Total   Spent	Left  Speed
100 14351  100 14351	0 	0  17275  	0 --:--:-- --:--:-- --:--:-- 17290
root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# ls
LICENSE.txt  build.sbt  data  engine.json  project  src  template.json
root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# ls
LICENSE.txt  build.sbt  data  engine.json  project  src  template.json
root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# ls -l
total 40
-rw-r--r-- 1 root root 11358 Nov 12 13:17 LICENSE.txt
-rw-r--r-- 1 root root  1233 Nov 12 13:17
-rw-r--r-- 1 root root   280 Nov 12 13:17 build.sbt
drwxr-xr-x 2 root root  4096 Nov 12 14:09 data
-rw-r--r-- 1 root root   384 Nov 12 13:17 engine.json
drwxr-xr-x 2 root root  4096 Nov 12 13:17 project
drwxr-xr-x 3 root root  4096 Nov 12 13:17 src
-rw-r--r-- 1 root root	53 Nov 12 13:17 template.json
root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# python data/ --access_key $ACCESS_KEY
Namespace(access_key='UyQifiuvbOYcOJJArkNZZ6HJYQoD-FhiO22Bvk19zsy7RLo4EuLUkEe_PWPNNz5N', file='./data/sample_movielens_data.txt', url='http://localhost:7070')
Importing data...
1501 events are imported.

pio build --verbose
pio train

To get the docker container ip address docker inspect ForPredictionIO

docker container commit ForPredictionIO lyhistory/predictionio-0.12.0:deployed
docker push lyhistory/predictionio-0.12.0:deployed
docker save lyhistory/predictionio-0.12.0:deployed > /home/lyhistory/workspace/lyhistory_predictionio-0.12.0_tag_deployed.tar
Pxz ***.tar  -- to make the size smaller

# 3. Developement


vi PredictionIO-0.12.0-incubating/conf/

PredictionIO-0.12.0-incubating/bin/pio status


# 4. TroubleShooting

Basic idea: Refer to other people’s setting, for example docker images settings Check the log for detailed info Make sure the service has been started

# 4.1 jps not a command

export JAVA_HOME=/usr/local/java/jdk1.8.0_151 export PATH=$JAVA_HOME/bin:$PATH Or because hbase not running properly, restart it.

# 4.2 connection refused

Check service status, find elasticsearch not running, then go to check the log: Cat ~/pio.log

Go to check : cat PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2/logs/predictionio.log

And then try manually start elasticsearch PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2/bin/elasticsearch Get the same error -- details

When using the root user, Elasticsearch cannot be started due to "don't run elasticsearch as root"

Then try this solution: Run ElasticSearch 5 as Root

git clone -b v5.5.2
vi core/src/main/java/org/elasticsearch/bootstrap/
export GRADLE_HOME=~/tmp/gradle-3.4
export PATH=${GRADLE_HOME}/bin:${PATH}
gradle assemble

Build failed, upgrade gradle

gradle assemble build error Upgrade gradle version

How to clear gradle cache? Rm -r .gradle/

cp /home/workspace/elasticsearch/distribution/tar/build/distributions/elasticsearch-5.5.2-SNAPSHOT.tar.gz .

tar zxvfC elasticsearch-5.5.2-SNAPSHOT.tar.gz PredictionIO-0.12.0-incubating/vendors/ Try to run it PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2-SNAPSHOT/bin/elasticsearch

Failed to connect to localhost port 9200 Sometimes it may happen that: This error is normal in the beginning since it takes a bit for elasticsearch to start up. But eventually it will go away. If the problem persists, please attach the entire output so that I can see what is wrong.

# 4.3 host machine run out space

The host is ubuntu, and docker defalut location/storage is in ‘/’ folder, but initially I only allocated 20 G to ‘/’, So I have to use Gparted(Live CD) to do resize, it’s very dangerous to move unallocated space around /boot, so you need to prepared to fix the booting issue, check So finally I resize ‘/’ to 70G Another lesson here is try to set the docker default location to /home, as normally we will allocate very large space to /home

# 4.4 others:

Key not found PIO_STORAGE_SOURCES__TYPE Wrong spelling, case sensitive

run firefox

root@be0576bd8d4e:/usr/bin# firefox
Error: GDK_BACKEND does not match available displays

apt-get install xvfb

root@be0576bd8d4e:/usr/bin# Xvfb :1 -screen 0 1024x768x16 &> xvfb.log  &
[1] 4113
root@be0576bd8d4e:/usr/bin# ps aux | grep X
root   	705  0.9 14.6 5774512 2386180 pts/1 Sl   13:02   0:31 /usr/local/java/jdk1.8.0_151/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2-SNAPSHOT -cp /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2-SNAPSHOT/lib/* org.elasticsearch.bootstrap.Elasticsearch -d -p /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/
root   	806  1.0  1.9 6106224 321684 pts/1  Sl   13:02   0:33 /usr/local/java/jdk1.8.0_151/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:PermSize=128m -XX:MaxPermSize=128m -Dhbase.log.dir=/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/bin/../logs -Dhbase.log.file=hbase--master-be0576bd8d4e.log -Dhbase.home.dir=/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/bin/.. -Dhbase.root.logger=INFO,RFA,RFAS org.apache.hadoop.hbase.master.HMaster start
root  	4113  0.3  0.2 215656 32832 pts/2	Sl   13:57   0:00 Xvfb :1 -screen 0 1024x768x16
root  	4121  0.0  0.0  11284   968 pts/2	S+   13:57   0:00 grep --color=auto X

Build error: Scala version Predictionio version and template version

Hadoop: Cannot use Jps command