Multithreading and multiprocess in Node.js

Multithreading and multiprocess in Node.js

Refs

  1. Node.js Worker Threads
  2. Deep dive into threads and processes in Node.js
  3. How do cluster and worker threads work in node.js

Multithreading: Worker thread

What and why

A thread that enables node.js to execute JavaScript in parallel. Useful to handle CPU intensive jobs.

How-to

Create a worker file => Make a promise in caller file => Define on message/error/exit hooks
worker.js file:

1
2
3
4
5
6
const { workerData, parentPort }	= require('worker_threads')

console.log('Technical Articles on ' + workerData);

parentPort.postMessage(
{ fileName: workerData, status: 'Done' })

index.js file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
const { Worker } = require('worker_threads')

function runService(workerData) {
return new Promise((resolve, reject) => {
const worker = new Worker(
'./worker.js', { workerData });
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0)
reject(new Error(
`Stopped the Worker Thread with the exit code: ${code}`));
})
})
}

async function run() {
const result = await runService('GeeksForGeeks')
console.log(result);
}

run().catch(err => console.error(err))

Key points:

  1. Worker use workerData to receive data from caller and parentPort to post data to caller.
  2. Worker use parentPort.postMessage to send data to caller and caller use on('message') to receive.

Multiprocess: Fork and Cluster

An example to address the problem of single-threaded node.js

Single threaded node.js will block on a time-consuming request.
Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const http = require('http');
const longComputation = () => {
let sum = 0;
for (let i = 0; i < 1e10; i++) {
sum += i;
};
return sum;
};
const server = http.createServer();
server.on('request', (req, res) => {
if (req.url === '/compute') {
console.info('计算开始',new Date());
const sum = longComputation();
console.info('计算结束',new Date());
return res.end(`Sum is ${sum}`);
} else {
res.end('Ok')
}
});

server.listen(3000);

Call localhost:3000/compute will block

Pros and cons of single-threaded node.js

Pros:

  1. Simple, no creation and switching of threads.
  2. Event loop and non-blocking asynchronous mechanism ensures high performance for high concurrency.
    Cons:
  3. CPU intensive calculation may block entire node.js app.
  4. An error may kill the thread, thus kill entire app. A daemon thread should be considered.
  5. Single thread does not take advantage of a multi-core CPU.

Fork

Use child_process.fork to create new process.
Main.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
const http = require('http');
const fork = require('child_process').fork;

const server = http.createServer((req, res) => {
if(req.url == '/compute'){
const compute = fork('./compute.js');
compute.send('开启一个新的子进程');

// 当一个子进程使用 process.send() 发送消息时会触发 'message' 事件
compute.on('message', sum => {
res.end(`Sum is ${sum}`);
compute.kill();
});

// 子进程监听到一些错误消息退出
compute.on('close', (code, signal) => {
console.log(`收到close事件,子进程收到信号 ${signal} 而终止,退出码 ${code}`);
compute.kill();
})
}else{
res.end(`ok`);
}
});
server.listen(3000, '127.0.0.1', () => {
console.log(`server started at http://127.0.0.1:3000`);
});

compute.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const computation = () => {
let sum = 0;
console.info('计算开始');
console.time('计算耗时');

for (let i = 0; i < 1e10; i++) {
sum += i
};

console.info('计算结束');
console.timeEnd('计算耗时');
return sum;
};

process.on('message', msg => {
console.log(msg, 'process.pid', process.pid); // 子进程id
const sum = computation();

// 如果Node.js进程是通过进程间通信产生的,那么,process.send()方法可以用来给父进程发送消息
process.send(sum);
})

Cluster

cluster can create worker process in a single file.
Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
const http = require('http');
const numCPUs = require('os').cpus().length;
const cluster = require('cluster');
if(cluster.isMaster){
console.log(`Master process id is ${process.pid}, cpu number ${numCPUs}`);
// fork workers
for(let i= 0;i<numCPUs;i++){
cluster.fork();
}
cluster.on('exit',function(worker,code,signal){
console.log('worker process died,id',worker.process.pid)
})
}else{
// Worker can share the same TCP connection
// It's an http server here
console.log(`created worker pid ${process.pid}`)
http.createServer(function(req,res){
res.writeHead(200);
res.end(String(process.pid));
}).listen(8000);

}

Running code above, we got:

1
2
3
4
5
Master process id is 18428, cpu number 4
created slave pid 16672
created slave pid 9896
created slave pid 14676
created slave pid 1460

We can find these process in task manager:
Multiple node.js processes
Now send a request in browser, and id of one of the four server process is returned:
Request
Try refresh many times and different pids may return(It depends, maybe one unlucky process shoulders all the workload).
Now kill one the worker 1460 in task manger and we got:

1
worker process died,id 1460

Refresh the browser and result is another pid other than 1460:
Request
You see, now our server is much more robust than before. We got four worker process, killing one of them and there are still three working.

Cluster calls the same fork method from child_process module under the hood. Cluster is a master-slave model, where master manages and schedules slaves.

Why no Error: EADDRINUSE when multiple processes listens on the same port?
The child processes aren’t listening to the same port. Incoming socket connections to the master process are being delegated to the child processes. There’s special handling for clustered process in server.listen(), it calls a method named listenInCluster() in some circumstances. See explanation here

Multithreading vs multiprocess

cluster

  • One process is launched on each CPU and can communicate via IPC.
  • Each process has its own memory with its own Node (v8) instance. Creating tons of them may create memory issues.
  • Great for spawning many HTTP servers that share the same port b/c the master process will multiplex the requests to the child processes.

worker threads

  • One process total
  • Creates multiple threads with each thread having one Node instance (one event loop, one JS engine). Most Node API’s are available to each thread except a few. So essentially Node is embedding itself and creating a new thread.
  • Shares memory with other threads (e.g. SharedArrayBuffer)
  • Great for CPU intensive tasks like processing data or accessing the file system. Because NodeJS is single threaded, synchronous tasks can be made more efficient with workers