Dr.Jiw: พฤศจิกายน 2018

วันพุธที่ 28 พฤศจิกายน พ.ศ. 2561

วันอังคารที่ 27 พฤศจิกายน พ.ศ. 2561

Free AWS ML

https://aws.amazon.com/th/training/learning-paths/machine-learning/?fbclid=IwAR1kxStHIeSW5wUCBa152GrOu4yGkr_MzILPYPPv3JggWpXOl3WN11B3oHk

ฤดูร้อนช่วยประเทศญี่ปุ่นและอเมริกา

ฤดูร้อนช่วยประเทศญี่ปุ่นให้รอดพ้นจากการโจมตีของมองโกลถึง 2 ครั้งเนื่องจากในช่วงที่มองโกทากองทัพเรือมาถึงญี่ปุ่นในช่วงหน้าร้อนก็เกิดไต้ฝุ่นถึง 2 ครั้งทำให้ต้องถอยทัพ
ในอดีตประเทศอเมริกาเครื่องหนึ่งถูกปกครองโดยฝรั่งเศสแต่ในช่วงหน้าร้อน ยุงที่เป็นพาหะของโรคไข้เหลืองระบาดทำให้กองทัพฝรั่งเศสตายไป 1 ใน 3 ของเซลล์จึงประกาศขายพื้นที่ดังกล่าวทั้งหมดให้กับอเมริกาในราคาถูก

วันอาทิตย์ที่ 25 พฤศจิกายน พ.ศ. 2561

ปัจจัยที่ทำให้ลืม

Web defacement stat

All the information contained in Zone-H's cybercrime archive were either collected online from public sources or directly notified anonymously to us.

http://zone-h.org/archive/special=1

SRAN's stat page:
http://bl.cipat.org/domain_blacklist

วันเสาร์ที่ 24 พฤศจิกายน พ.ศ. 2561

Web application automation tool

https://www.seleniumhq.org/

วันอาทิตย์ที่ 18 พฤศจิกายน พ.ศ. 2561

Conversion, Engagement

Conversion is the point at which a recipient of a marketing message performs a desired action.

https://www.dynamicyield.com/glossary/conversion/

Engagement marketing is the use of strategic, resourceful content to engage people, and create meaningful interactions over time.

https://www.marketo.com/engagement-marketing/

A/B testing

คือการแบ่งทดสอบ 2 แนวทาง a และ b ว่าใครเหมาะสมกว่ากัน
hooktalk.com/การทำ-ab-testing-101/

วันศุกร์ที่ 16 พฤศจิกายน พ.ศ. 2561

Python VS Matlab

https://pyzo.org/python_vs_matlab.html

Python IDEs
https://www.datacamp.com/community/tutorials/data-science-python-ide?utm_source=adwords_ppc&utm_campaignid=1455363063&utm_adgroupid=65083631748&utm_device=c&utm_keyword=&utm_matchtype=b&utm_network=g&utm_adpostion=1t2&utm_creative=278443377083&utm_targetid=aud-392016246653:dsa-473406587035&utm_loc_interest_ms=&utm_loc_physical_ms=1012728&gclid=CjwKCAiA8rnfBRB3EiwAhrhBGgE-NoEeSozz2GkMw7CEM-RfZnwlb_Szh0EpKbIHRREF9JOkD-NKSRoCImYQAvD_BwE

Python's Math plot library
https://matplotlib.org/users/pyplot_tutorial.html

IoT protocols

https://electronicsforu.com/technology-trends/popular-iot-protocols

วันศุกร์ที่ 9 พฤศจิกายน พ.ศ. 2561

Feature selection techniques

Comparison test

t-test ใช้วัดความแตกต่างของค่าเฉลี่ยของสองกลุ่มว่ามีนัยสำคัญหรือไม่ www.sthda.com/english/wiki/t-test-formula ตย ของ Paired t-test https://www.jmp.com/en_nl/statistics-knowledge-portal/t-test/paired-t-test.html
ANOVA ใช้วัดความแตกต่างของค่าเฉลี่ยของ 3 กลุ่มขึ้นไปว่ามีนัยสำคัญหรือไม่ http://www.sthda.com/english/wiki/wiki.php?title=one-way-anova-test-in-r#what-is-one-way-anova-test และยังใช้ทำ feature selection ด้วย https://towardsdatascience.com/anova-for-feature-selection-in-machine-learning-d9305e228476
Friedman test ใช้เมื่อไม่มีสมมติฐานว่า data are normally distributed within each group ไม่เหมือนกับกรณี ANOVA นั่นคือ Fredman test is suitable for non-normal data.
Nemenyi test is used in conjunction with non-parametric statistical tests, such as the Friedman test, to determine which specific groups (e.g., treatments or methods) are significantly different from each other. (A non-parametric statistical test is a type of statistical test that does not require the data to follow specific assumptions about their underlying distribution, such as normality or homogeneity of variance. These tests are often used when parametric test assumptions are violated, the data are ordinal or ranked, or the sample size is small.)

Correlation test

Pearson’s product moment correlation that figure out Pearson’s product moment coefficient (r = [-1,1]) (r^2 is goodness of fit)
Chi square (X^2) ใช้หา correlation

Using both traditional feature selection like t-tetst and XAI approaches provides a more robust framework:
- T-test/Filter methods can quickly filter out features that have zero statistical link to the target, simplifying the initial dataset before model training. The t-test measures the statistical relationship between a single feature and the target variable, independent of any machine learning model.
- XAI methods are then used after training to fine-tune the feature set by identifying features that, while perhaps statistically significant in isolation, are redundant or not utilized by the complex ML model. XAI methods (like SHAP or Permutation Importance) measure a feature's contribution to a specific model's performance and prediction, thereby considering complex feature interactions.
Studies often show that XAI methods can consistently recover important features that are missed by simple statistics and vice versa, underscoring the value of implementing both.

Imbalanced data set

A problem in classification i.e. labels are discrete
Binary class data set is imbalanced if YES and NO classes (i.e., the labels of data points in the set) are not 50/50 (or 60/40) in terms of the number of data points.
A dataset is marginally imbalanced if one class is rare compared to the other class.
Solved by under sampling (i.e. use all the smaller class and randomly select same number of majority class several times to make multiple data sets and then combine all classification results) to balance it this is best way for this without loosing information. If you use boosting you could alter the weights and balance data that way (https://www.researchgate.net/post/How_to_know_that_our_dataset_is_imbalance). อีกวิธีที่นิยมใช้แก้ปัญหาชื่อ SMOTE ซึ่งเป็นการสร้าง synthetic samples (i.e. oversampling) ขึ้นมา https://towardsdatascience.com/dealing-with-imbalanced-classes-in-machine-learning-d43d6fa19d2
สมมติให้ data set มี 2 classes Class A มีจำนวน 90 เปอร์เซ็นต์ Class B มีจำนวน 10 เปอร์เซ็นต์ ถ้าไม่แก้ปัญหา imbalance แล้วใช้วิธีแล้วใช้วิธี Random ให้ 90% อยู่ใน A จะได้ accuracy 91% แต่ถ้าพยากรณ์ถูกหมดทั้ง A & B (อาจใช้over/undersampling ช่วย) จะได้ accuracy 100%

Recommendation systems

RS is a kind of information filtering (of items recommended to users).
Types:

Content-based RS : Recommendation of new items to users is performed by looking at (historic) item to (new) item (item is content) similarity. It uses item profile and user preference profile (i.e. items user liked) (Cf. "Handbook on ontologies")
Collaborative filtering RS : Recommendation of new items to user is performed based on item reviews (i.e. other users) within a community of the user. (Cf. "Handbook on ontologies") Collaborative filtering works by using the ratings provided by a community of users to recommend items for a specific user. There are two complementary approaches available, user-based or item-based collaborative filtering. User based collaborative filtering is where similar users are found and items recommended that these similar users also liked. Item-based collaborative filtering is where items are grouped if people rate them similarly then the items are recommended together.
Knowledge-based RS aka rule-based RS : RS of either above types that is supplimented by knowledge base e.g. to calculate user-item or item-item attribute similarity (i.e., similarity of attributes existing in both user and item) (https://medium.com/recombee-blog/recommender-systems-explained-d98e8221f468); The attributes may be in form of ontology-based RS (which has knowledge engineering bottleneck i.e. various problems in knowledge acquisition); this RS is aka rule-based RS in which the rule may be heuristic rules or ML-based classification rules like decision tree.
Hybrid RS is mix of the first two types.

User profiling methods:

Knowledge based approach : uses static models of users and dynamically match users to the closest model. Questionnaires and interviews are often employed to obtain this user knowledge. Once a model is selected for a user, specific domain knowledge (from ontology) for that user type can be applied to help describe the user.
Behavioral based approach : uses the user’s behaviour as a model, commonly using machine-learning techniques to discover useful patterns in the behaviour. Behavioural logging is employed to obtain the data necessary from which to extract patterns.

Common problem of RS is cold start problem (a state of a deployed RS having not enough info on item or user or community (in case of collaborative RS) to give recommendation). Cf. https://en.wikipedia.org/wiki/Cold_start_(computing)

วันพฤหัสบดีที่ 8 พฤศจิกายน พ.ศ. 2561

การอภิปรายผลของการพยากรณ์ด้วย machine learning

เมื่อได้โมเดลออกมาต้องลอง interpret โมเดล เช่น อ่าน decision tree ให้ได้
เมื่อได้ผลกาพยากรณ์ออกมาเป็นค่าแล้วต้องอธิบายสาเหตุว่าทำไมได้ค่าเท่านั้น เช่น พยากรณ์ว่าตู้ ATM นี้มี demand มากต้องอธิบายได้ว่าเพราะอะไร เพราะคนบริเวณนั้นยังไม่ใช้ cashless app. เพราะ ส่วนใหญ่เป็นผู้สูงอายุ เป็นต้น

วันพุธที่ 7 พฤศจิกายน พ.ศ. 2561

Data types in data mining : discrete, nominal, continuous, binary, ordinal

Src: https://stats.stackexchange.com/questions/159902/is-nominal-ordinal-binary-for-quantitative-data-qualitative-data-or-both

Nominal เรียกอีกอย่างหนึ่งว่า categorical

Discretization = quantization = the process of constraining an input from a continuous or otherwise large set of values (such as the real numbers) to a discrete set (such as the integers). เช่น แรงดัน 2.5 ถึง 5V. ให้ถือเป็น 1; Opposite is "nominal to numeric" process.

A cardinal number tells "how many." Cardinal numbers are also known as "counting numbers," because they show quantity.

วันจันทร์ที่ 5 พฤศจิกายน พ.ศ. 2561

HTTP Strict Transport Security (HSTS)

HTTP Strict Transport Security (HSTS) allows web servers to declare that web browsers (or other complying user agents) should interact with it using only secure HTTPS connections,[1] and never via the insecure HTTP protocol.
When a web application issues HSTS Policy to user agents, user agents behave as follows:
1. Automatically turn any insecure links referencing the web application into secure links. (For instance, http://example.com/some/page/ will be modified to https://example.com/some/page/before accessing the server.)
2. If the security of the connection cannot be ensured (e.g. the server's TLS certificate is not trusted), the user agent must terminate the connection (and should not allow the user to access the application.
The HSTS Policy helps protect web application users against some passive (eavesdropping) and active network attacks. A man-in-the-middle attacker has a greatly reduced ability to intercept requests and responses between a user and a web application server.
HSTS can fix is SSL-stripping man-in-the-middle attacks, working by transparently converting a secure HTTPS connection into a plain HTTP connection. The user can see that the connection is insecure, but crucially there is no way of knowing whether the connection should be secure. Many websites do not use TLS/SSL, therefore there is no way of knowing (without prior knowledge) whether the use of plain HTTP is due to an attack, or simply because the website hasn't implemented TLS/SSL.
--wiki