Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes

ECE 602 Term Project

Winter 2018

Author: Yanbing Jiang 20501974 & Zirui Wu 20740156

Professor: Oleg Michainovich TA: Mohammad Hossein Basiri

--------------------------------------------------------------------------

Contents

Introduction

In the provided paper, the authors set a goal to solve the problem of inferring the network of social influence from the dynamics of the historical events observed on many individuals, and try to producing a predictive model to answer quantitaively of the following question: when will the indivial inside the society take an action given that some other individuals have already done that.

To address this, the authors proposed a regularized convex optimization approach to discovering the hidden network of social influence based on multi-dimentional Hawkes Process.

From a modeling perspective, the authors would like to take into account several key features of social influence between people and/or among societies/communities (will be discussed later).

Additionally, the multi-dimensional Hawkes process captures the mutually-exciting and recurrent nature of individual behaviours who are active in the social activities, while the regularization using different kind of norms on the infectivity matrix to allow the observer to impose priors on the network topology.

Data Synthesis and Data Generation

$$\lambda ^{*}=\lim_{h\rightarrow 0} \frac{E[N(t+h)-N(t)|H(t)]}{h}$$

Where H(t) is the historical events. Intuitively, conditional intensity times time interval h is the probability that an event occurs during this infinitesimal interval. Conditional intensity is crucial in sythesizing our training data. However, it is not defined in the original paper.

$$\lambda _{u}(t) = \mu _{u}\sum_{i: t_{i}< t}a_{uu_{i}}g(t-t_{i})$$

where $\mu _{u}$ is defined as the base intensity for u-th dimension in Hawkes processes. $a_{uu_{i}}$ corresponds to an element in matrix $A$ which characterizes the mutual exciting coefficients. $A$ is called infectivity matrix. $g$ is kernel function, which is taken be to be a 1-d Gaussian function as shown in the following:

$$g(t) = we^{-wt}$$

The following function is built for this kernel funciton:

function g = kernel_g(dt, w)
   % Kernel Function defined in the paper
   g = w * exp(-w .* dt);
   % Debugging to set value which has ripple effect
   g(g>1) = 0;
end

Filling in the Matrix $U$ and $V$ with the following rules for a random data generation with the method mentioned in the paper: * The initial set for the matrix $Z$ and $U$ are taken to be zeroes. Also, note that both matrices have the same size of $A$ 's. * Note that the purpose of Matrix $A$, $U$, $V$ and paramter $\mu$ would be described in detail the "Proposed Solution" Section.

clc;            % Initialization
clear;
D = 1000;       % Dimension of HP
U = zeros(D,9); % Initialization of Matrix U
V = zeros(D,9); % Initialization of Matrix V

for i = 1:9
    U(100*(i-1)+1:100*(i+1),i) = rand(200,1)/10;
    V(100*(i-1)+1:100*(i+1),i) = rand(200,1)/10;
end
A = U*V';
A = 0.8 * A./max(abs(eig(A)));
mu = rand(D,1)/D; % True Parameter 1000 dim vector
w = 1;            % w in the kernel function, to be 1 as stated above

Following is the process of generating the samples of multi-dimention Hawkes Process with the accumulation time utilized for the sample generation:

num_samples = 250; % the number of sequences
nc = 100; % the maximum number of events per sequence
hp_samples = gen_hp_samples(w,mu,A, num_samples,nc); % generating samples
#samples=10/250, time=0.28sec
#samples=20/250, time=0.36sec
#samples=30/250, time=0.42sec
#samples=40/250, time=0.50sec
#samples=50/250, time=0.56sec
#samples=60/250, time=0.62sec
#samples=70/250, time=0.66sec
#samples=80/250, time=0.71sec
#samples=90/250, time=0.77sec
#samples=100/250, time=0.81sec
#samples=110/250, time=0.86sec
#samples=120/250, time=0.91sec
#samples=130/250, time=0.96sec
#samples=140/250, time=1.01sec
#samples=150/250, time=1.06sec
#samples=160/250, time=1.11sec
#samples=170/250, time=1.16sec
#samples=180/250, time=1.21sec
#samples=190/250, time=1.26sec
#samples=200/250, time=1.31sec
#samples=210/250, time=1.35sec
#samples=220/250, time=1.39sec
#samples=230/250, time=1.43sec
#samples=240/250, time=1.48sec
#samples=250/250, time=1.52sec

The sample generated will have a structure as following with events and timestampts:

function hp_samples = gen_hp_samples(w,mu,A, num_samples,nc)
tic;
hp_samples = struct('timestamp', [],'event', []);
for n = 1:num_samples
    t=0;
    timestamp_and_event = [];
    lambdat = mu;                               % mu is the 1000 dim vector base intensity
    lambdat_sum = sum(lambdat);                 % Sum of all base intensities,used for gen next time-step
    while size(timestamp_and_event, 2) < nc
        rand_u = rand;
        dt = random('exp', 1/lambdat_sum);      % For generating the next time step
        lambda_ts = comp_lambda(t+dt, [], t, lambdat,w,mu,A);
        lambdats_sum = sum(lambda_ts);
        if rand_u > lambdats_sum / lambdat_sum  % Discard sample
            t = t+dt;
            lambdat = lambda_ts;
        else                                    % Keep sample
            u = rand * lambdats_sum;
            lambda_sum = 0;
            for d=1:length(mu)                  % d=1:1000
                lambda_sum = lambda_sum + lambda_ts(d);%sum over all lamda one by one
                if lambda_sum >= u              % Decide which dim event occurs
                    break;                      %end if
                end
            end
            lambdat = comp_lambda(t+dt, d, t, lambdat,w,mu,A);
            t = t+dt;
            timestamp_and_event = [timestamp_and_event,[t;d]];%append new sample
        end
        lambdat_sum = sum(lambdat);            % Time difference for next loop
    end                                        % While size(timestamp_and_event, 2)<nc
    hp_samples(n).timestamp = timestamp_and_event(1,:);
    hp_samples(n).event = timestamp_and_event(2,:);
    if mod(n, 10)==0                           % Print out the generation process
        fprintf('#samples=%d/%d, time=%.2fsec\n',n, num_samples, toc);
    end
end                                            %for n = 1:num_samples

Proposed Solutions

First of all, there appears to be a TYPO by the authors in the algorithm above (ADM4). The first line immediately after the first while loop should be updating $A$ and $\mu$ instead of $A_{k+1}$ and $\mu_{k+1}$. Futhermore, the first line after the second while loop should be updating $A_{k+1}$ and $\mu_{k+1}$ instead of $A$ and $\mu$. After this adjustment, the algoritm then tends to match the content of other parts of the original paper.

Our code is structured after making the adjustment over the algorithm above. An inner function optimization $A$ and $\mu$ (the second while loop) and another function on top of that doing the first while loop as well as the initialization of paramaters. One thing worth noting is that only $A$ and $\mu$ are returned after the algorithm ends. All the other parameters are either fixed during the procedure of traning samples or training variables assisting the optimization of $A$ and $\mu$.

Furthermore, the convergence condition is not numerically stated by the original paper. Hence in our program, we used both a fixed number of iterations and the threshold condition on the current error to break the while loop, which may be executed a huge number of time.

%start optimization
num_iter_1 = 2;    % Number of Iteration of the First while loop (while k = 1,2 ...)
num_iter_2 = 7;    % Number of Iteration of the Second while loop (while not converge)
rho = 0.09;
thold = 1;         % thershold value

Some Notes:

Implementation of the Solutions

[x,y,Iteration_Err] = optimize_a_mu(hp_samples,D,w,rho,num_iter_1,num_iter_2,thold,A);
No. 1 outter while iteration | No. 1 inner while iterationnon-zero in mu= 1000
non-zero in C= 604531
non-zero in B = 1000000, non-zero in sqrt = 1000000
real error = 0.9609, correlation = 0.1688, #non-zero in A = 532775

No. 1 outter while iteration | No. 2 inner while iterationnon-zero in mu= 1000
non-zero in C= 532775
non-zero in B = 532775, non-zero in sqrt = 532775
real error = 0.4003, correlation = 0.0645, #non-zero in A = 413334

No. 1 outter while iteration | No. 3 inner while iterationnon-zero in mu= 1000
non-zero in C= 413334
non-zero in B = 413334, non-zero in sqrt = 413334
real error = 0.2981, correlation = 0.0133, #non-zero in A = 330105

No. 1 outter while iteration | No. 4 inner while iterationnon-zero in mu= 1000
non-zero in C= 330105
non-zero in B = 330105, non-zero in sqrt = 330105
real error = 0.2869, correlation = 0.0030, #non-zero in A = 267133

No. 1 outter while iteration | No. 5 inner while iterationnon-zero in mu= 1000
non-zero in C= 267133
non-zero in B = 267133, non-zero in sqrt = 267133
real error = 0.2849, correlation = 0.0010, #non-zero in A = 216017

No. 1 outter while iteration | No. 6 inner while iterationnon-zero in mu= 1000
non-zero in C= 216017
non-zero in B = 216017, non-zero in sqrt = 216017
real error = 0.2841, correlation = 0.0005, #non-zero in A = 172956

No. 1 outter while iteration | No. 7 inner while iterationnon-zero in mu= 1000
non-zero in C= 172956
non-zero in B = 172956, non-zero in sqrt = 172956
real error = 0.2837, correlation = 0.0003, #non-zero in A = 135350

No. 1 outter while iteration | No. 8 inner while iterationnon-zero in mu= 1000
non-zero in C= 135350
non-zero in B = 135350, non-zero in sqrt = 135350
real error = 0.2834, correlation = 0.0002, #non-zero in A = 101753

No. 2 outter while iteration | No. 1 inner while iterationnon-zero in mu= 1000
non-zero in C= 101753
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2831, correlation = 0.0002, #non-zero in A = 73060

No. 2 outter while iteration | No. 2 inner while iterationnon-zero in mu= 1000
non-zero in C= 73060
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2829, correlation = 0.0001, #non-zero in A = 49428

No. 2 outter while iteration | No. 3 inner while iterationnon-zero in mu= 1000
non-zero in C= 49428
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2828, correlation = 0.0001, #non-zero in A = 32331

No. 2 outter while iteration | No. 4 inner while iterationnon-zero in mu= 1000
non-zero in C= 32331
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2827, correlation = 0.0000, #non-zero in A = 21096

No. 2 outter while iteration | No. 5 inner while iterationnon-zero in mu= 1000
non-zero in C= 21096
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2826, correlation = 0.0000, #non-zero in A = 14011

No. 2 outter while iteration | No. 6 inner while iterationnon-zero in mu= 1000
non-zero in C= 14011
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2826, correlation = 0.0000, #non-zero in A = 9622

No. 2 outter while iteration | No. 7 inner while iterationnon-zero in mu= 1000
non-zero in C= 9622
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2825, correlation = 0.0000, #non-zero in A = 6785

No. 2 outter while iteration | No. 8 inner while iterationnon-zero in mu= 1000
non-zero in C= 6785
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2825, correlation = 0.0000, #non-zero in A = 4972

No. 3 outter while iteration | No. 1 inner while iterationnon-zero in mu= 1000
non-zero in C= 4972
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2824, correlation = 0.0000, #non-zero in A = 3777

No. 3 outter while iteration | No. 2 inner while iterationnon-zero in mu= 1000
non-zero in C= 3777
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2824, correlation = 0.0000, #non-zero in A = 2870

No. 3 outter while iteration | No. 3 inner while iterationnon-zero in mu= 1000
non-zero in C= 2870
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2823, correlation = 0.0000, #non-zero in A = 2261

No. 3 outter while iteration | No. 4 inner while iterationnon-zero in mu= 1000
non-zero in C= 2261
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2823, correlation = 0.0000, #non-zero in A = 1796

No. 3 outter while iteration | No. 5 inner while iterationnon-zero in mu= 1000
non-zero in C= 1796
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2823, correlation = 0.0000, #non-zero in A = 1480

No. 3 outter while iteration | No. 6 inner while iterationnon-zero in mu= 1000
non-zero in C= 1480
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2823, correlation = 0.0000, #non-zero in A = 1236

No. 3 outter while iteration | No. 7 inner while iterationnon-zero in mu= 1000
non-zero in C= 1236
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2823, correlation = 0.0000, #non-zero in A = 1069

No. 3 outter while iteration | No. 8 inner while iterationnon-zero in mu= 1000
non-zero in C= 1069
non-zero in B = 101753, non-zero in sqrt = 101753
real error = 0.2823, correlation = 0.0001, #non-zero in A = 943

function [A_m,mu_m,Iteration_Err] = optimize_a_mu(hp_samples,D,w,rho,num_iter_1,num_iter_2,thold,real_A)
    % Input the training samples, parameters, number of iterations
    % and the real A for iterative learning.
    A_m = rand(D,D);          % Start with a random matrix to make optimization
    A_m = 0.8 * A_m./max(abs(eig(A_m)));  % Similar with the real A, scale the spectral radius of A_m to 0.8
    mu_m = rand(D, 1)./D;     % Start with a random mu value to make optimization as well
    UL = zeros(D,D);          % Matrix U's Parameter of Lowrank Property
    ZL = zeros(D,D);          % Matrix Z's Parameter of Lowrank Property
    US = zeros(D,D);          % Matrix U's Parameter of Sparse Property
    ZS = zeros(D,D);          % Matrix U's Parameter of Sparse Property
    %
    for i = 0:num_iter_1      % Outer while loop for algorithm 1 in the paper
		    rho = rho * 1.1;      % Iteratively change the value of rho each time
        for j = 0:num_iter_2  % Inner while loop for algorithm 1 in the paper
            % Main Algorithm mentioned above
            fprintf('No. %d outter while iteration | No. %d inner while iteration', i+1,j+1);
			    [A_m,mu_m,RelErr] = update_a_mu(A_m,mu_m,hp_samples,UL,ZL,US,ZS,w,rho,D,real_A);
            Iteration_Err(j+1,i+1) = RelErr;   % Capture the error during each iteration
        end
		    [s,v,d] = svd(A_m + US);     % Update ZL via SVD (Singular Value Decomposition)
		    v = v - thold/rho;
		    v(v < 0) = 0;                % Get ride of negatives
		    ZL = s * v * d';             % Update ZL (Z for LowRank)
		    UL = UL + (A_m - ZL);        % Update UL (U for LowRank)
		    tmp = (abs(A_m + US)-thold/rho);
		    tmp(tmp < 0) = 0;
		    ZS = (sign(A_m + US).*tmp);  % Update ZS (Z for Sparse)
		    US = US + (A_m-ZS);          % Update US (U for Sparse)
    end
 end

The above function is the frame of optimization to update $A$ and $\mu$, which includes iteration frame and the updates over matrix $U$ and $Z$ under the condition of Lowrank and Sparse respectively. The core update function that used to update parameter is called update_a_mu(.), which is a demonstration of the algorithm ADM4 stated above. the function code is shown as following:

function [A,mu,RelErr] = update_a_mu(A_m,mu_m,hp_samples,UL,ZL,US,ZS,w,rho,D,real_A)
    num_samples = length(hp_samples);
    %following the Algorthm 1: ADM4
    mu_numerator = zeros(D, 1);%init [D,1] all zeros; D -> dim of HP
    C = zeros(size(A_m));           % Initiation of the Matrix C (Equation 10 in the paper)
    A_Step = zeros(size(A_m)) + 2 * rho;
    B = zeros(size(A_m)) + rho*(UL-ZL) + rho*(US-ZS);% Add rho to B and its initialization (Equation 10 in the paper)
    for s = 1:num_samples           % Iterate through all the samples, 50,000
        timestamp = hp_samples(s).timestamp;% A vector, let's say 100 time stamps
        event = hp_samples(s).event;% A vector, each value a dim, same size as Time
        tc = timestamp(end);        % End time
        nc = length(event);
        dt = tc - timestamp;        % dt for all every time step
        for i = 1:nc                % Iterate through all the timestamps and events in ONE sample
            ui = event(i);          % Scalar
            ti = timestamp(i);      % Scalar
            int_g = kernel_int_g(dt, w);
            B(ui,:) = B(ui,:) + double(A_m(ui,:)>0).*repmat(int_g(i),[1,D]);
            pii = [];               % Initiation of the pii parameter is the Q objective function (Equation 8 in the paper)
            pij = [];               % Initiation of the pij parameter is the Q objective function (Equation 8 in the paper)
            ag_arr = [];            % Sum Auiuj*g(ti-tj)
            if i>1
                tj = timestamp(1:i-1);                        % Vector
                uj = event(1:i-1);                            % Vector
                ag_arr = A_m(uj, ui) .* kernel_g(ti - tj, w)';% Vector
            end
            pii = mu_m(ui)./(mu_m(ui) + sum(ag_arr));    % Calculation of pii in Equation 8 in the paper
            if i>1
                pij = ag_arr./(mu_m(ui) + sum(ag_arr));  % Calculation of pij in Equation 8 in the paper
                if ~isempty(pij) && sum(pij(:))>0
                    for j = 1:length(uj)
                        uuj = uj(j);
                        C(uuj,ui) = C(uuj,ui) - pij(j,:);% Calculation of matrix C in Equation 10 in the paper
                    end
                end
            end
            mu_numerator(ui) = mu_numerator(ui) + pii;
        end                                           % for i = 1:nc (number of sequence in a sample)
    end                                               % for s = 1:num_samples (number of samples)
    mu = mu_numerator./(zeros(D, 1) + tc);            % Equation 9 in the paper
    A = (-B + sqrt(B.^2 - 4*A_Step.*C))./(2*A_Step);  % Equation 10 in the paper % Given AmatA = zeros(size(A_m)) + 2 * rho;
    RelErr = real_err(real_A,A);                      % Storage of Iteration Relative Error
    fprintf('non-zero in mu= %d\n',nnz(mu));          % Testing of the sparse consition for mu
    fprintf('non-zero in C= %d\n',nnz(C));            % Testing of the sparse consition for C
    fprintf('non-zero in B = %d, non-zero in sqrt = %d\n',nnz(B),nnz(sqrt(B.^2 - 4*A_Step.*C)));
    fprintf('real error = %.4f, correlation = %.4f, #non-zero in A = %d\n\n',real_err(real_A,A),sum(abs(corr(real_A(:),A(:)))),nnz(A));
    A(isnan(A))=0;                                    % Refer to Equation 10 in the paper
    A(isinf(A))=0;
    %(A==0) = 0.00001;
  end                                                 %end function
function err = real_err(A,A_m)
   % The calculation od relative error
   % Relative error is defined as the averaged
   % relative error between the estimated parameters
   % and the true parameters
   % i.e aij != 0 then err = err_1 or err = err_2 defined following.
    err_1 = abs(A - A_m) ./ A;
	    err_2 = abs(A - A_m);
	    err_1(isnan(err_1))=0;
    err_1(isinf(err_1))=0;
	    err = err_1 .* double(A~=0) + err_2.*double(A==0);
	    err = sum(err(:))/double(numel(A));
end

Debugging and Improvements

Learning Results

As the result as permutated according to the number of iterations in outter and inner while loop, we, firstly, repermutated the errors into a vector and then plot the corresponding relative error vs number of iteration plot with the following codes.

Iteration_Err2 = reshape(Iteration_Err,[],1);
figure('Name','Relative Error vs. # of iteration Under Hawkes Process');
plot(Iteration_Err2,'--ro','LineWidth',1,'MarkerSize',5,'MarkerEdgeColor','b');
grid;
title('Relative Error vs. # of iteration Under Hawkes Process', 'FontSize',14);
set(gca,'YTick',0:0.1:2.2);
xlabel(' # of iteration (n)','FontSize',12);
ylabel('Relative Error','FontSize',12);

Conclusions and Thoughts

This triple summation in Equation 10 in the original paper (element Matrix C) used three different conventions: the subscrpts for the last two summation signs are squeezed together and we have to infer from the words on how it is contrcuted. Using $u$ and $\mu$ together, although technically correct, did make it hard for the readers to figure out what each variable is. Also convention states capital letters should be used for matrix only instead of an element in the matrix.

Reference List

Source files and functions

- Multi-dimension Hawkes Process Simulation:

comp_lambda.m: function for computing lambda in the paper iteratively.

gen_hp_samples.m: function for generating samples.

- Optimization:

update_a_mu.m: function for computering the inner loop. Solve A,mu via equation 10.

optimize_a_mu.m: function for solving $A$ and $\mu$ via all the samples.

- Validation:

real_err.m: function for checking error between $A$ and $A_{m}$

- Utility functions (for kernel functions):

kernel_g.m: function for calculating exp kernel

kernel_G.m: function for calculating integration of the exp kernel