วันจันทร์ที่ 8 พฤษภาคม พ.ศ. 2560

Convolution neural network (cnn)

An MLP with normally 3 hidden layers : convolution layer (similar to sliding window but here called filter producing output called feature map) & pooling layer (for non-linear down-sampling e.g. max pooling taking max value pixel) & fully connected layer.
Traditional MLP do not scale well to higher resolution images. For example, in CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully connected neuron in a first hidden layer of a regular neural network would have 32*32*3 = 3,072 weights. A 200x200 image, however, would lead to neurons that have 200*200*3 = 120,000 weights.
Also, such network architecture does not take into account the spatial structure of data, treating input pixels which are far apart the same as pixels that are close together. Thus, full connectivity of neurons is wasteful for the purpose of image recognition




CNN classification (by Gemini)
1. Image-Based CNNs
Classic Architectures: LeNet-5, AlexNet, VGGNet, GoogLeNet, ResNet
Specialized Architectures: MobileNet, EfficientNet, DenseNet, R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN
Key Features: Grid-like input data, convolutional layers, pooling layers, fully connected layers.
2. Sequence-Based CNNs
1D CNNs: Used for processing sequential data like text or time series.
Key Features: 1D convolutional layers, pooling layers, fully connected layers.
3. Graph-Based CNNs
GCNs: Graph Convolutional Networks; while Image-based CNNs are designed to process data with a regular grid structure, like images, GCNs can handle data where data can be naturally represented as a graph, such as social networks, molecular structures, and knowledge graphs.
Key Features: Graph-structured input data, graph convolution operation, graph Laplacian, handling irregular data.
4. Hybrid CNNs
Combining Different Types: CNNs can be combined with other architectures like Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks for tasks involving both spatial and temporal information.

FCNN (Fully CNN) vs Traditional CNN
  • Traditional CNN: Features extracted in early convolutional layers are passed to Fully Connected (FC) layers at the end of the network. Because FC layers require every neuron to be connected to every input, the network is restricted to a fixed input size and loses spatial context.
  • FCNN: The network contains only convolutional layers, using a combination of downsampling (for features) and upsampling (for output). This allows FCNNs to process inputs of arbitrary sizes and output spatial maps (like segmentation masks) directly.
  • Traditional CNN: FC layers account for the majority of the network's trainable parameters, making them prone to overfitting when handling high-dimensional data like large images.
  • FCNN: By maintaining a localized receptive field throughout the entire model, FCNNs are significantly more parameter-efficient, resulting in faster training times and reduced memory usage. [1, 2, 3, 4, 5]