论文标题

使用机器学习估算石油回收因子:XGBoost分类的应用

Estimating oil recovery factor using machine learning: Applications of XGBoost classification

论文作者

Roustazadeh, Alireza, Ghanbarian, Behzad, Male, Frank, Shadmand, Mohammad B., Taslimitehrani, Vahid, Lake, Larry W.

论文摘要

在石油工程中,必须确定最终恢复因子RF,尤其是在剥削和探索之前。但是,准确的估计需要不一定在储层开发的早期阶段可用或测量的数据。因此,我们使用随时可用的功能应用机器学习(ML)来估算本研究中定义的十个类别的油RF。为了构建ML模型,我们应用了XGBoost分类算法。选择分类是因为恢复因子从0到1界限,就像概率一样。合并了三个数据库,为我们提供了四种不同的组合来首次训练并测试ML模型,然后使用包括看不见的数据的独立数据库进一步评估它们。在训练数据集上应用了具有十倍折叠的交叉验证方法,以评估模型的有效性。为了评估模型的准确性和可靠性,确定了准确性,邻域准确性和宏平均F1分数。总体而言,结果表明,XGBoost分类算法可以在训练数据集中以高达0.49的准确性估算RF类,在测试数据集中为0.34,在所使用的独立数据库中进行0.2。我们发现XGBoost模型的可靠性取决于培训数据集中的数据,这意味着ML模型取决于数据库。特征重要性分析和形状方法表明,最重要的特征是储量和储层区域和厚度。

In petroleum engineering, it is essential to determine the ultimate recovery factor, RF, particularly before exploitation and exploration. However, accurately estimating requires data that is not necessarily available or measured at early stages of reservoir development. We, therefore, applied machine learning (ML), using readily available features, to estimate oil RF for ten classes defined in this study. To construct the ML models, we applied the XGBoost classification algorithm. Classification was chosen because recovery factor is bounded from 0 to 1, much like probability. Three databases were merged, leaving us with four different combinations to first train and test the ML models and then further evaluate them using an independent database including unseen data. The cross-validation method with ten folds was applied on the training datasets to assess the effectiveness of the models. To evaluate the accuracy and reliability of the models, the accuracy, neighborhood accuracy, and macro averaged f1 score were determined. Overall, results showed that the XGBoost classification algorithm could estimate the RF class with reasonable accuracies as high as 0.49 in the training datasets, 0.34 in the testing datasets and 0.2 in the independent databases used. We found that the reliability of the XGBoost model depended on the data in the training dataset meaning that the ML models were database dependent. The feature importance analysis and the SHAP approach showed that the most important features were reserves and reservoir area and thickness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源