This notebook demos how to calculate the active activation potential (AAP) values from DNP output. For each feature selected by DNP, an AAP value is calculated for the feature and is used to rank all the features.

  • To run this notebook, first run the modified DNP package on terminal with:

    python (or python3) run_DNP.py > dnp_output.txt

  • Here, we also included this command in the notebook as well.

  • In this case, num_cv = 10 (10-fold cross validation), num_repeat = 1 (run the experiment 1 time)

  • num_cv is the number of fold to run cross validtion, change its value in DNP.py
  • num_repeat is how many times you want to repeat the experiment.
  • The output is written to the file 'dnp_output.txt'
In [5]:
import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 500)

import warnings
warnings.filterwarnings('ignore')

1. Run run_DNP.py to get DNP output

In [6]:
%%time

# depend on how big is your data, this cell can run for several minutes

# If you got an error "No Module named mxnet" when running run_DNP.py, 
# uncomment the next cell and run it to install mxnet

!python run_DNP.py > dnp_output.txt
CPU times: user 1.68 s, sys: 396 ms, total: 2.08 s
Wall time: 2min 8s
In [49]:
# import sys
# !{sys.executable} -m pip install mxnet
  • Run the following cell to run multiple experiments
In [1]:
%%time

# To run the experiment multiple times

# num_repeat = 5

# for i in range(num_repeat):
    
#     if open('dnp_output.txt'):
        
#         if i == 0:
#             print('The file exists! Press capital Y to override it! Or any other key to exist.')
#             if input() == 'Y':
#                 !python run_DNP.py > dnp_output.txt
#             else:
#                 break 
            
#     !python run_DNP.py >> dnp_output.txt
The file exists! Press capital Y to override it! Or any other key to exist.
m
Wall time: 4.08 s

2. Process DNP output to calculate AAP values

  • Read the DNP output to file_ls
In [7]:
file_ls = []

with open('dnp_output.txt', 'r') as f:
    for line in f:
        file_ls.append(line.strip())
In [8]:
# An example of what the line we want to process
file_ls[70]
Out[8]:
'feature 2363 positive contribution 1.039250'
  • Extract only lines start with 'feature' in the file
In [9]:
features = [line for line in file_ls if line[:7] == 'feature']
In [10]:
features[:5]
Out[10]:
['feature 0 positive contribution 1.763143',
 'feature 3193 positive contribution 1.264798',
 'feature 5982 positive contribution 1.201803',
 'feature 2276 positive contribution 1.022001',
 'feature 3788 positive contribution 1.146630']
  • Keep the feature IDs and the positive contribution values
In [11]:
feature_ids = [feature.split()[1] for feature in features]
feature_weights = [feature.split()[-1] for feature in features]
In [12]:
# Convert feature weights to pandas series
feature_weights = np.array(feature_weights, dtype=np.float32)
feature_weights = pd.Series(feature_weights, index=feature_ids)
In [13]:
# convert from python list to pandas series to use unique() method
feature_ids = pd.Series(feature_ids) 

3. Calculate the AAP values

  • calculate the average activation potential of each feature
In [14]:
# get a list of unique ID from fid 
# (repetative from k-fold cross validation and multiple repeats of the experiment)
unique_id = feature_ids.unique()

# k-fold cross validation, change the value in DNP.py
num_cv = 10
# repeat of experiments
num_repeat = 1

# calculate average activation potential AAP
aap = []


for id in unique_id:
    id_mean = feature_weights[id].sum() / (num_cv * num_repeat)
    aap.append(id_mean)

aap = np.array(aap)
aap = pd.Series(aap, index=unique_id)

# sort the aap in descending order
aap = aap.sort_values(ascending=False)
  • Show the first 20 features.
    • Note that feature 0 is the bias item, not a feature in the data.
    • **The feature ID is the index of the column minus 1 in your original data** because the bias item is added to the data as the first column before feeding to DNP.
In [15]:
aap[:21]
Out[15]:
0       1.149063
3193    0.884282
4788    0.685514
1666    0.478807
1085    0.347336
5982    0.329814
2680    0.324656
2295    0.320281
4421    0.259780
1869    0.258139
6480    0.243915
1720    0.239356
6166    0.237652
2382    0.235488
1626    0.227465
2229    0.221701
6142    0.221249
4892    0.220810
4914    0.213280
4307    0.211139
2082    0.208805
dtype: float64
  • To use the AAP function, run the following cell
In [16]:
import AAP

x = AAP.AAP(num_cv=num_cv, num_repeat=num_repeat, file_name='dnp_output.txt')
In [17]:
x[:21]
Out[17]:
0       1.149063
3193    0.884282
4788    0.685514
1666    0.478807
1085    0.347336
5982    0.329814
2680    0.324656
2295    0.320281
4421    0.259780
1869    0.258139
6480    0.243915
1720    0.239356
6166    0.237652
2382    0.235488
1626    0.227465
2229    0.221701
6142    0.221249
4892    0.220810
4914    0.213280
4307    0.211139
2082    0.208805
dtype: float64
In [ ]: