摘要: 堆疊是集成多個分類法或回歸模型的方式。有很多方法可以集成模型,眾所周知的模型有Bagging或Boosting。Bagging允許多個具有高方差類似的分類模型中取平均以減少差異。Boosting建立多個增量模型,以減少誤差,同時保持方差小。

摘要: You probably used random forest for regression and classification before, but time series forecasting? Hold up you’re going to say; time series data is special! And you’re right. When it comes to data that has a time dimension, applying machine learning (ML) methods becomes a little tricky.....

摘要: Bayesian Target Encoding is a feature engineering technique used to map categorical variables into numeric variables. The Bayesian framework requires only minimal updates as new data is acquired and is thus well-suited for online learning. Furthermore, the Bayesian approach makes choosing and interpreting hyperparameters intuitive. I developed this technique in the recent Avito Kaggle Competition, where my team and I took 14th place out of 1,917 teams. We found that the Bayesian target encoding outperforms the built-in categorical encoding provided by the LightGBM package.

摘要: 在深度學習中除了兜模型外,最重要的就是模型內的參數,也就是weight部分,每個模型開始學習前都需要有一個對應的初始值。這時候有些人會覺得初始值不就隨機給或是給0開始學就好了啊,我一開始接觸也是這麼覺得的,對於簡單的應用(目標函數是convex)/方法這個方式可能有行,但對於神經網路而言若是有一個好的初始值對於模型學習更是事半功倍,若是初始值不好或是目標函數是non-convex問題則會造成神經網路學習到不好的結果。

摘要: 線上百科全書維基百科為政治研究提供了極其豐富但尚未開發的資源。在本教程中提供了實用回顧,以展示這些平台如何為公眾關注動態,政策,政治和其他事件,政治精英和政黨提供研究信息等等。

摘要: Imbalanced classes are a common problem in machine learning classification where there are a disproportionate ratio of observations in each class. Class imbalance can be found in many different areas including medical diagnosis, spam filtering, and wildfire detection.

摘要: Machine learning is increasingly moving from hand-designed models to automatically optimized pipelines using tools such as H20, TPOT, and auto-sklearn. These libraries, along with methods such as random search, aim to simplify the model selection and tuning parts of machine learning by finding the best model for a dataset with little to no manual intervention. However, feature engineering, an arguably more valuable aspect of the machine learning pipeline, remains almost entirely a human labor.

摘要: 粒子群算法(Particle swarm optimization,PSO)是模擬群體智能所建立起來的一種優化算法,主要用於解決最優化問題(optimization problems)。 1995年由 Eberhart和Kennedy 提出,是基於對鳥群覓食行為的研究和模擬而來的。

Popular Tags