LINK: PredicitonIO: Build and Deploy ML Applications in a Fraction of the Time

Multiple Events and Multiple Algorithms PredictionIO CLI Cheatsheet

1.Set up(on ubuntu)

1.1 Host Selection

Run on virtualbox

Method 1: Download Dpkg install Method 2:

Ref: [SOLVED] Setting up VirtualBox-5.1 and vboxconfig failing on Fedora24 sudo mokutil –disable-validation

Tips: Sudo apt-get remove virtualbox-{version} Disable secure boot

Try run HDP sandbox

Run on Docker

Docker images
Docker ps -a
Docker rm ‘’
Docker run -it ‘’ bash
Docker exec -it ‘’ bash
Docker login
Docker pull

1.2 Install PredictionIO

Tips: Refer to other Dockerfile Refer to other

1.2.1 try docker images

Failed because of low version, doesn’t match with the template (the version not latest)

docker run -it -p 8000:8000 steveny/predictionio /bin/bash
jps -l
Pio status
Apt-get install git
pip install -U setuptools

1.2.2 try local deployment through heroku

Failed because it charges during the process Requirements ● Heroku account ● Heroku CLI, command-line tools ● git

root@be0576bd8d4e:/home/workspace/engine-dir# wget -qO- | sh
root@be0576bd8d4e:/home/workspace/engine-dir/pio-engine-ur# heroku create $ENGINE_NAME
Creating app... !
 ▸	Invalid credentials provided.
Enter your Heroku credentials:
Email: [email protected]
Password: ********
Creating app... done, ⬢ obscure-wave-35511 |

1.2.3 Build from src Install java export JAVA_HOME=/usr/local/java/jdk1.8.0_151 export PATH=$JAVA_HOME/bin:$PATH Install postgresql How To Install and Use PostgreSQL on Ubuntu 16.04 #/etc/init.d/postgresql start Switch account: su postgres Postgres Psql

root@be0576bd8d4e:/home/workspace/postgresql# /etc/init.d/postgresql start
 * Starting PostgreSQL 9.6 database server                                                                                                                                                       	[ OK ]
root@be0576bd8d4e:/home/workspace/postgresql# ps -ef | grep post
postgres  6367 	0  0 08:35 ?    	00:00:00 /usr/lib/postgresql/9.6/bin/postgres -D /var/lib/postgresql/9.6/main -c config_file=/etc/postgresql/9.6/main/postgresql.conf
postgres  6369  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: checkpointer process   
postgres  6370  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: writer process   
postgres  6371  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: wal writer process   
postgres  6372  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: autovacuum launcher process   
postgres  6373  6367  0 08:35 ?    	00:00:00 postgres: 9.6/main: stats collector process   
root  	6387  6332  0 08:36 pts/1	00:00:00 grep --color=auto post
root@be0576bd8d4e:/home/workspace/postgresql# postgres
bash: postgres: command not found
root@be0576bd8d4e:/home/workspace/postgresql# su postgres
postgres@be0576bd8d4e:/home/workspace/postgresql$ psql install PredictionIO

gpg –import KEYS gpg –verify apache-predictionio-0.12.0-incubating.tar.gz.asc apache-predictionio-0.12.0-incubating.tar.gz

tar zxvf apache-predictionio-0.12.0-incubating.tar.gz -C ./apache-predictionio-0.12.0-incubating/

export JAVA_HOME=/usr/local/java/jdk1.8.0_151 export PATH=$JAVA_HOME/bin:$PATH ./


+ Step 2 Spark config:
PredictionIO-0.12.0-incubating/conf/ and change the SPARK_HOME

+ Step 3 Storage:

1)	Postgresql 9.6 refer to 4.3.2

2)	Mysql

Docker pull mysql:5.7 docker run –name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:tag –character-set-server=utf8mb4 –collation-server=utf8mb4_unicode_ci docker run –name mysql -e MYSQL_ROOT_PASSWORD=password -d mysql:5.7 –character-set-server=utf8mb4 –collation-server=utf8mb4_unicode_ci

$PIO_HOME/lib/mysql-connector jar Create pio db

3)	HBase and Elasticsearch 
	PredictionIO-0.12.0-incubating/conf/ and change the PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME
	PredictionIO-0.12.0-incubating/conf/ and change the PIO_STORAGE_SOURCES_HBASE_HOME
Edit PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/conf/hbase-site.xml.
hbase.rootdir file:///home/abc/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/data /home/abc/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/zookeeper
Edit PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/conf/ to set JAVA_HOME for the cluster. For example:

export JAVA_HOME=/usr/local/java/jdk1.8.0_151

root@be0576bd8d4e:/home/workspace/apache-predictionio-0.12.0-incubating# PredictionIO-0.12.0-incubating/bin/pio status [INFO] [Management$] Inspecting PredictionIO… [INFO] [Management$] PredictionIO 0.12.0-incubating is installed at /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating [INFO] [Management$] Inspecting Apache Spark… [INFO] [Management$] Apache Spark is installed at /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.6 [INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of 1.3.0) [INFO] [Management$] Inspecting storage backend connections… [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)… [INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)… [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)… [INFO] [Storage$] Test writing to Event Store (App Id 0)… [INFO] [HBLEvents] The namespace pio_event doesn’t exist yet. Creating now… [INFO] [HBLEvents] The table pio_event:events_0 doesn’t exist yet. Creating now… [INFO] [HBLEvents] Removing table pio_event:events_0… [INFO] [Management$] Your system is all ready to go.

#### 1.2.4 Use existing pio image 

**Step 1: Set static ip**

apt-get install openssh-server

Configure Node Networking cat /etc/network/interfaces

interfaces(5) file used by ifup(8) and ifdown(8)

auto lo iface lo inet loopback

The primary network interface

auto eth0 iface eth0 inet static address netmask #network #broadcast gateway dns-nameservers #dns-domain #dns-search

**Step 2: Setup docker mysql:5.7**

Docker pull mysql:5.7

sudo docker run -d –name mysql_dev
-p 3306:3306
-d mysql:5.7

sudo docker exec -ti mysql_dev mysql -uroot -ppassword

create database pio DEFAULT CHARSET utf8 COLLATE utf8_general_ci; CREATE USER ‘pio’@’localhost’ IDENTIFIED BY ‘pio’; CREATE USER ‘pio’@’%’ IDENTIFIED BY ‘pio’; GRANT ALL ON . TO ‘pio’@’localhost’; GRANT ALL ON . TO ‘pio’@’%’; flush privileges;


Reset mysql container: docker rm -f mysql_dev

**Step 3: Setup docker predictionio:0.12.0**

1) Load predictionio image:

scp [email protected]:/root/*.xz ./

pxz -cd ./pio-0.12.0.tar.xz | sudo docker load

2) Download sample template - MyRecommendation:
git clone MyRecommendation

3) Config vendor:

scp [email protected]:/root/*.gz ./ Hbase-1.1.2 Spark-2.1.0-bin-hadoop2.7 zookeeper

**Step 4: Run and config PIO**

1) conf/

sudo docker run -ti -p 7080:7070 -p 8110:8000
-v $(readlink -e ~/PIO/vendors):/PredictionIO-0.12.0-incubating/vendors
-v $(readlink -e ~/MyRecommendation):/MyRecommendation
–link mysql_dev
–name pio_$(whoami)
pio:0.12.0 bash

Then check linked folders: MyRecommendation and vendors
Try ping mysql_dev container servicep

Config pio:




2) start pio and check status

3) new app and import data

pio app new MyApp1

curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY
-H “Content-Type: application/json”
-d ‘{ “event” : “rate”, “entityType” : “user”, “entityId” : “u0”, “targetEntityType” : “item”, “targetEntityId” : “i0”, “properties” : { “rating” : 5 } “eventTime” : “2014-11-02T09:39:45.618-08:00” }’

curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY
-H “Content-Type: application/json”
-d ‘{ “event” : “buy”, “entityType” : “user”, “entityId” : “u1”, “targetEntityType” : “item”, “targetEntityId” : “i2”, “eventTime” : “2014-11-10T12:34:56.123-08:00” }’

curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY”

curl –create-dirs -o data/sample_movielens_data.txt python data/ –access_key $ACCESS_KEY–gn6mgP78jtaOvGQjJHWVCPk_3MNUYOTas-pVyAm3

**Step 5: build and deploy**

Pio build –verbose Pio train Pio deploy

## 2. Quick start predictionio

PATH=$PATH:/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/bin; export PATH



root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# pio app new MyApp1 [INFO] [HBLEvents] The table pio_event:events_1 doesn’t exist yet. Creating now… [INFO] [App$] Initialized Event Store for this app ID: 1. [INFO] [Pio$] Created a new app: [INFO] [Pio$] Name: MyApp1 [INFO] [Pio$] ID: 1 [INFO] [Pio$] Access Key: UyQifiuvbOYcOJJArkNZZ6HJYQoD-FhiO22Bvk19zsy7RLo4EuLUkEe_PWPNNz5N root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation#


root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \

-H “Content-Type: application/json”
-d ‘{ “event” : “rate”, “entityType” : “user”, “entityId” : “u0”, “targetEntityType” : “item”, “targetEntityId” : “i0”, “properties” : { “rating” : 5 } “eventTime” : “2014-11-02T09:39:45.618-08:00” }’ HTTP/1.1 201 Created Server: spray-can/1.3.3 Date: Sun, 12 Nov 2017 13:34:55 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 57

{“eventId”:”illrLcpg1dDE2bvZZ1NpggAAAUlxl11SrFjDLEi9C6A”}root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY” HTTP/1.1 200 OK Server: spray-can/1.3.3 Date: Sun, 12 Nov 2017 13:35:35 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 270

[{“eventId”:”illrLcpg1dDE2bvZZ1NpggAAAUlxl11SrFjDLEi9C6A”,”event”:”rate”,”entityType”:”user”,”entityId”:”u0”,”targetEntityType”:”item”,”targetEntityId”:”i0”,”properties”:{“rating”:5},”eventTime”:”2014-11-02T09:39:45.618-08:00”,”creationTime”:”2017-11-12T13:34:54.933Z”}]root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \

-H “Content-Type: application/json”
-d ‘{ “event” : “buy”, “entityType” : “user”, “entityId” : “u1”, “targetEntityType” : “item”, “targetEntityId” : “i2”, “eventTime” : “2014-11-10T12:34:56.123-08:00” }’ HTTP/1.1 201 Created Server: spray-can/1.3.3 Date: Sun, 12 Nov 2017 13:36:09 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 57

{“eventId”:”Z0813DMQIKz7N4VGxZhmngAAAUmbap37v3QoMu7STuI”}root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY -H “Content-root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl -i -X GET “http://localhost:7070/events.json?accessKey=$ACCESS_KEY” HTTP/1.1 200 OK Server: spray-can/1.3.3 Date: Sun, 12 Nov 2017 13:36:15 GMT Content-Type: application/json; charset=UTF-8 Content-Length: 528

[{“eventId”:”Z0813DMQIKz7N4VGxZhmngAAAUmbap37v3QoMu7STuI”,”event”:”buy”,”entityType”:”user”,”entityId”:”u1”,”targetEntityType”:”item”,”targetEntityId”:”i2”,”properties”:{},”eventTime”:”2014-11-10T12:34:56.123-08:00”,”creationTime”:”2017-11-12T13:36:08.992Z”},{“eventId”:”illrLcpg1dDE2bvZZ1NpggAAAUlxl11SrFjDLEi9C6A”,”event”:”rate”,”entityType”:”user”,”entityId”:”u0”,”targetEntityType”:”item”,”targetEntityId”:”i0”,”properties”:{“rating”:5},”eventTime”:”2014-11-02T09:39:45.618-08:00”,”creationTime”:”2017-11-12T13:34:54.933Z”}]root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation#


**interact with python sdk**

Install python apt-get install -y python-pip pip install –upgrade pip pip install -U setuptools Install python sdk pip install predictionio

Import data: root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# curl –create-dirs -o data/sample_movielens_data.txt % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 14351 100 14351 0 0 17275 0 –:–:– –:–:– –:–:– 17290 root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# ls LICENSE.txt build.sbt data engine.json project src template.json root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# ls LICENSE.txt build.sbt data engine.json project src template.json root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# ls -l total 40 -rw-r–r– 1 root root 11358 Nov 12 13:17 LICENSE.txt -rw-r–r– 1 root root 1233 Nov 12 13:17 -rw-r–r– 1 root root 280 Nov 12 13:17 build.sbt drwxr-xr-x 2 root root 4096 Nov 12 14:09 data -rw-r–r– 1 root root 384 Nov 12 13:17 engine.json drwxr-xr-x 2 root root 4096 Nov 12 13:17 project drwxr-xr-x 3 root root 4096 Nov 12 13:17 src -rw-r–r– 1 root root 53 Nov 12 13:17 template.json root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation# python data/ –access_key $ACCESS_KEY Namespace(access_key=’UyQifiuvbOYcOJJArkNZZ6HJYQoD-FhiO22Bvk19zsy7RLo4EuLUkEe_PWPNNz5N’, file=’./data/sample_movielens_data.txt’, url=’http://localhost:7070’) Importing data… 1501 events are imported. root@be0576bd8d4e:/home/workspace/engine-dir/MyRecommendation#

pio build –verbose pio train

To get the docker container ip address
docker inspect ForPredictionIO


docker container commit ForPredictionIO lyhistory/predictionio-0.12.0:deployed docker push lyhistory/predictionio-0.12.0:deployed docker save lyhistory/predictionio-0.12.0:deployed > /home/lyhistory/workspace/lyhistory_predictionio-0.12.0_tag_deployed.tar Pxz ***.tar – to make the size smaller

## 3. Developement

vi PredictionIO-0.12.0-incubating/conf/

PredictionIO-0.12.0-incubating/bin/pio-start-all PredictionIO-0.12.0-incubating/bin/pio-stop-all PredictionIO-0.12.0-incubating/bin/pio status



## 4. TroubleShooting

Basic idea:
	Refer to other people’s setting, for example docker images settings
	Check the log for detailed info
	Make sure the service has been started

### 4.1 jps not a command
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
Or because hbase not running properly, restart it.

### 4.2 connection refused
Check service status, find elasticsearch not running, then go to check the log:
Cat ~/pio.log

Go to check :
cat PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2/logs/predictionio.log

And then try manually start elasticsearch
Get the same error -- details

When using the root user, Elasticsearch cannot be started due to "don't run elasticsearch as root"

Then try this solution:
Run ElasticSearch 5 as Root

git clone -b v5.5.2 vi core/src/main/java/org/elasticsearch/bootstrap/ wget unzip export GRADLE_HOME=~/tmp/gradle-3.4 export PATH=${GRADLE_HOME}/bin:${PATH} gradle assemble

Build failed, upgrade gradle

gradle assemble build error
Upgrade gradle version

How to clear gradle cache? 
Rm -r .gradle/

cp /home/workspace/elasticsearch/distribution/tar/build/distributions/elasticsearch-5.5.2-SNAPSHOT.tar.gz .

tar zxvfC elasticsearch-5.5.2-SNAPSHOT.tar.gz PredictionIO-0.12.0-incubating/vendors/
Try to run it 

Failed to connect to localhost port 9200
Sometimes it may happen that:
This error is normal in the beginning since it takes a bit for elasticsearch to start up.
But eventually it will go away.
If the problem persists, please attach the entire output so that I can see what is wrong.

### 4.3 host machine run out space
The host is ubuntu, and docker defalut location/storage is in ‘/’ folder, but initially I only allocated 20 G to ‘/’, 
So I have to use Gparted(Live CD) to do resize, it’s very dangerous to move unallocated space around /boot, so you need to prepared to fix the booting issue, check
So finally I resize ‘/’ to 70G
Another lesson here is try to set the docker default location to /home, as normally we will allocate very large space to /home

### 4.4 others:
Wrong spelling, case sensitive

run firefox

root@be0576bd8d4e:/usr/bin# firefox Error: GDK_BACKEND does not match available displays

apt-get install xvfb

root@be0576bd8d4e:/usr/bin# Xvfb :1 -screen 0 1024x768x16 &> xvfb.log & [1] 4113 root@be0576bd8d4e:/usr/bin# ps aux | grep X root 705 0.9 14.6 5774512 2386180 pts/1 Sl 13:02 0:31 /usr/local/java/jdk1.8.0_151/bin/java -Xms2g -Xmx2g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2-SNAPSHOT -cp /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2-SNAPSHOT/lib/* org.elasticsearch.bootstrap.Elasticsearch -d -p /home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/ root 806 1.0 1.9 6106224 321684 pts/1 Sl 13:02 0:33 /usr/local/java/jdk1.8.0_151/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:PermSize=128m -XX:MaxPermSize=128m -Dhbase.log.dir=/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/bin/../logs -Dhbase.log.file=hbase–master-be0576bd8d4e.log -Dhbase.home.dir=/home/workspace/apache-predictionio-0.12.0-incubating/PredictionIO-0.12.0-incubating/vendors/hbase-1.2.6/bin/.. -Dhbase.root.logger=INFO,RFA,RFAS org.apache.hadoop.hbase.master.HMaster start root 4113 0.3 0.2 215656 32832 pts/2 Sl 13:57 0:00 Xvfb :1 -screen 0 1024x768x16 root 4121 0.0 0.0 11284 968 pts/2 S+ 13:57 0:00 grep –color=auto X root@be0576bd8d4e:/usr/bin# ``` Build error: Scala version Predictionio version and template version

Hadoop: Cannot use Jps command