Thursday, January 26, 2017

Using PDFTK to Process PDF Documents

When dealing with PDF documents, we come across different requirements such as splitting a document in to parts and sometimes merging multiple documents together. I recently came across a great tool which we can use to do various tasks with PDF documents. I decided to list down few things I can do using pdftk tool and leave a link to a better resource.

(1) Install pdftk tool.

sudo apt-get update

sudo apt-get install pdftk


(2) Suppose I want to create a new pdf file extracted from the contents of the page numbers 31 to 37 in a larger pdf file. We can do that as follows.

pdftk input.pdf cat 31-37 output output.pdf

(3) Merging two documents one at the end of another.

pdftk input-file1.pdf input-file2.pdf cat output new-document.pdf

(4) Selecting few pages from multiple documents and putting them together into a single document.

pdftk A=input-file1.pdf B=input-file2.pdf cat A110-117 B2-3 output new-document.pdf

That's it!

References


Wednesday, January 25, 2017

Independent Component Analysis (ICA)

When we want to separate two signals which are mixed up, one interesting method we can use is Independent Component Analysis (ICA).  I think I'm not knowledgeable enough to explain how this whole thing works but there are plenty of explanations about it in the web. Take a look at the references listed at the end for any further details. But, the purpose of this article is to put the codes I used recently for an ICA job so that I will not forget how to use it in the future.

In order to perform ICA on Python we need to install an important package first. Let's do it first.

sudo pip install --upgrade pip
pip install -U scikit-learn

Now, it's time to write the Python script. The following script is taking two wav files as input which contains two mixed signals in different ways. Then it generates another two new wav files which contains the separated signals.


"""
=====================================
Blind source separation using FastICA
=====================================

An example of estimating sources from noisy data.

:ref:`ICA` is used to estimate sources given noisy measurements.
Imagine 3 instruments playing simultaneously and 3 microphones
recording the mixed signals. ICA is used to recover the sources
ie. what is played by each instrument. Importantly, PCA fails
at recovering our `instruments` since the related signals reflect
non-Gaussian processes.

"""
print(__doc__)

import os
import wave
import pylab
import matplotlib

import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile

from sklearn.decomposition import FastICA, PCA

###############################################################################

matplotlib.rcParams['ps.useafm'] = True
matplotlib.rcParams['pdf.use14corefonts'] = True
matplotlib.rcParams['text.usetex'] = True

# read data from wav files
sample_rate1, samples1 = wavfile.read('100000010mix1.wav')
sample_rate2, samples2 = wavfile.read('100000010mix2.wav')

print 'sample_rate1', sample_rate1
print 'sample_rate2', sample_rate2

S = np.c_[samples1, samples2]

ica = FastICA(n_components=2)
S_ = ica.fit_transform(S)  # Reconstruct signals

print 'original signal=', S
print 'recovered signal=', S_
print 'extracted signal1', S_[:,0]
print 'extracted signal2', S_[:,1]

# write data to wav files
scaled1 = np.int16(S_[:,0]/np.max(np.abs(S_[:,0])) * 32767)
wavfile.write('extracted-signal-1.wav', sample_rate1, scaled1)

scaled2 = np.int16(S_[:,1]/np.max(np.abs(S_[:,1])) * 32767)
wavfile.write('extracted-signal-2.wav', sample_rate2, scaled2)

###############################################################################
# Plot results

pylab.figure(num=None, figsize=(10, 10))

pylab.subplot(411)
pylab.title('(received signal 1)')
pylab.xlabel('Time (s)')
pylab.ylabel('Sound amplitude')
pylab.plot(samples1)


pylab.subplot(412)
pylab.title('(received signal 2)')
pylab.xlabel('Time (s)')
pylab.ylabel('Sound amplitude')
pylab.plot(samples2)


pylab.subplot(413)
pylab.title('(extracted signal 1)')
pylab.xlabel('Time (s)')
pylab.ylabel('Sound amplitude')
pylab.plot(S_[:,0])

pylab.subplot(414)
pylab.title('(extracted signal 2)')
pylab.xlabel('Time (s)')
pylab.ylabel('Sound amplitude')
pylab.plot(S_[:,1])

pylab.subplots_adjust(hspace=.5)
pylab.savefig('extracted-data.pdf')
pylab.show()

Run this Python script with the two wav files in the same directory and you will get the output signals as wav files in addition to seeing waveform plots of all these signals.