วันจันทร์ที่ 8 พฤษภาคม พ.ศ. 2560

Convolution neural network (cnn)

An MLP with normally 3 hidden layers : convolution layer (similar to sliding window but here called filter producing output called feature map) & pooling layer (for non-linear down-sampling e.g. max pooling taking max value pixel) & fully connected layer.
Traditional MLP do not scale well to higher resolution images. For example, in CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully connected neuron in a first hidden layer of a regular neural network would have 32*32*3 = 3,072 weights. A 200x200 image, however, would lead to neurons that have 200*200*3 = 120,000 weights.
Also, such network architecture does not take into account the spatial structure of data, treating input pixels which are far apart the same as pixels that are close together. Thus, full connectivity of neurons is wasteful for the purpose of image recognition




CNN classification (by Gemini)
1. Image-Based CNNs
Classic Architectures: LeNet-5, AlexNet, VGGNet, GoogLeNet, ResNet
Specialized Architectures: MobileNet, EfficientNet, DenseNet, R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN
Key Features: Grid-like input data, convolutional layers, pooling layers, fully connected layers.
2. Sequence-Based CNNs
1D CNNs: Used for processing sequential data like text or time series.
Key Features: 1D convolutional layers, pooling layers, fully connected layers.
3. Graph-Based CNNs
GCNs: Graph Convolutional Networks; Image-based CNNs are designed to process data with a regular grid structure, like images. GCNs can handle data where the connections between elements are not fixed or regular, such as social networks, molecular structures, and knowledge graphs.
Key Features: Graph-structured input data, graph convolution operation, graph Laplacian, handling irregular data.
4. Hybrid CNNs
Combining Different Types: CNNs can be combined with other architectures like Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks for tasks involving both spatial and temporal information.