Quantcast
Viewing latest article 2
Browse Latest Browse All 2

How to design n-dimensional feature descriptor similar as the input image?

I am re-writing the H-Net code in Keras for cross-domain image similarity. The network architecture is described in the attached paper. I wrote the encoder and decoder parts but unable to get similar dimenstions from decoder. Following is the code. Please, point me out where I am making mistake.

import tensorflow as tf 
from keras.models import Model
from keras.layers import Input, Conv2D, Conv2DTranspose, BatchNormalization, Activation, MaxPool2D

# pe: Photo Encoder, PFM: Photo Feature Map, pd: Photo Decoder

# Input shape for encoder 
input_img = Input(shape=(265,256,1))

pe = Conv2D(96, kernel_size=11, strides=(4,4), padding = 'SAME',name='conv1')(input_img)
pe = BatchNormalization()(pe)
pe = Activation('selu')(pe)
pe = MaxPool2D((3, 3), strides=(2, 2), padding = 'VALID', name = 'pool1')(pe)

pe = Conv2D(256, kernel_size=5, strides=(1,1), padding = 'SAME', name='conv2')(pe)
pe = BatchNormalization()(pe)
pe = Activation('selu')(pe)
pe = MaxPool2D((3, 3), strides=(2, 2), padding = 'VALID', name = 'pool2')(pe) 

pe = Conv2D(384, kernel_size=3, strides=(1,1), padding = 'SAME', name='conv3')(pe)
pe = BatchNormalization()(pe)
pe = Activation('selu')(pe)

pe = Conv2D(384, kernel_size=3, strides=(1,1), padding = 'SAME', name='conv4')(pe) 
pe = BatchNormalization()(pe)
pe = Activation('selu')(pe)

pe = Conv2D(256, kernel_size=3, strides=(1,1), padding = 'SAME', name='conv5')(pe) 
pe = BatchNormalization()(pe)
pe = Activation('selu')(pe)
pe = MaxPool2D((3, 3), strides=(2, 2), padding = 'VALID', name = 'pool3')(pe) 

pe = Conv2D(1024, kernel_size=7, strides=(1, 1), name = 'con2vector', padding = 'VALID')(pe)
pe = BatchNormalization()(pe)
p_encoder = Activation('selu')(pe)

pfm = tf.reshape(p_encoder,[-1, 1024])

'''
Output of each conv2d and pooling layers for Photo-Encoder

Block 1 convolution:  (?, 64, 64, 96)
Block1 maxpooling:  (?, 31, 31, 96)
Block 2 convolution:  (?, 31, 31, 256)
Block 2 convolution:  (?, 15, 15, 256)
Block 3 convolution:  (?, 15, 15, 384)
Block 4 convolution:  (?, 15, 15, 384)
Block 5 convolution:  (?, 15, 15, 256)
Block 5 maxpooling:  (?, 7, 7, 256)
Block 6 convolution:  (?, 1, 1, 1024)
Shape of photo feature map:  (?, 1024)
'''

'''
Following code is the implementation of photo decoder 
'''
pd_input = tf.reshape(p_encoder, [-1, 2, 2, 256])

pd = Conv2DTranspose(128, kernel_size=5, strides=(2, 2), padding='SAME', name="pdcnv1")(pd_input)
pd = Activation("selu")(pd)

pd = Conv2DTranspose(64, kernel_size=5, strides=(2, 2), padding='SAME',name="pdcnv2")(pd) 
pd = Activation("selu")(pd)

pd = Conv2DTranspose(32, kernel_size=5, strides=(2, 2), padding='SAME',name="pdcnv3")(pd)
pd = Activation("selu")(pd)

pd = Conv2DTranspose(16, kernel_size=5, strides=(2, 2), padding='SAME',name="pdcnv4")(pd) 
pd = Activation("selu")(pd)

pd = Conv2DTranspose(8, kernel_size=5, strides=(2, 2), padding='SAME',name="pdcnv5")(pd)
pd = Activation("selu")(pd)

pd = Conv2DTranspose(4, kernel_size=5, strides=(2, 2), padding='SAME',name="pdcnv6")(pd)
pd = Activation("selu")(pd)

p_decoder = Conv2DTranspose(3, kernel_size=5, strides=(2, 2), padding='SAME', activation='sigmoid', name="pdcnv7")(pd) # (?, ?, ?, 3)

"""
Output of each Conv2DTranspose layer
Photo decoder input shape:  (?, 2, 2, 256)
Photo deconv block 1:  (?, ?, ?, 128)
Photo deconv block 2:  (?, ?, ?, 64)
Photo deconv block 3:  (?, ?, ?, 32)
Photo deconv block 4:  (?, ?, ?, 16)
Photo deconv block 5:  (?, ?, ?, 8)
Photo deconv block 6:  (?, ?, ?, 4)
Photo deconv block 7:  (?, ?, ?, 3)
"""

autoencoder = Model(input_img, p_decoder)

Viewing latest article 2
Browse Latest Browse All 2

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>