文章详情页

Python：UserWarning：此模式具有匹配组。要实际获得组，请使用str.extract

浏览：12日期：2022-08-07 13:30:45

如何解决Python：UserWarning：此模式具有匹配组。要实际获得组，请使用str.extract？

中的至少一个正则表达式模式urls必须使用捕获组。 str.contains仅针对其中的每一行返回True或Falsedf[’event_time’]－不使用捕获组。因此，UserWarning警告您正则表达式使用捕获组，但未使用匹配项。

如果要删除，则UserWarning可以从正则表达式模式中找到并删除捕获组。它们没有显示在您发布的正则表达式模式中，但是它们必须在您的实际文件中。在字符类之外查找括号。

或者，您可以通过以下方式禁止此特定的UserWarning

import warningswarnings.filterwarnings('ignore', ’This pattern has match groups’)

在致电之前str.contains。

这是一个简单的示例，演示了问题（和解决方案）：

# import warnings# warnings.filterwarnings('ignore', ’This pattern has match groups’) # uncomment to suppress the UserWarningimport pandas as pddf = pd.DataFrame({ ’event_time’: [’gouda’, ’stilton’, ’gruyere’]})urls = pd.DataFrame({’url’: [’g(.*)’]}) # With a capturing group, there is a UserWarning# urls = pd.DataFrame({’url’: [’g.*’]}) # Without a capturing group, there is no UserWarning. Uncommenting this line avoids the UserWarning.substr = urls.url.values.tolist()df[df[’event_time’].str.contains(’|’.join(substr), regex=True)]

版画

script.py:10: UserWarning: This pattern has match groups. To actually get the groups, use str.extract. df[df[’event_time’].str.contains(’|’.join(substr), regex=True)]

从正则表达式模式中删除捕获组：

urls = pd.DataFrame({’url’: [’g.*’]})

避免了UserWarning。

解决方法

我有一个数据框，我尝试获取字符串，其中的列上包含一些字符串Df像

member_id,event_path,event_time,event_duration30595,'2016-03-30 12:27:33',yandex.ru/,130595,'2016-03-30 12:31:42',030595,'2016-03-30 12:31:43',yandex.ru/search/?lr=10738&msid=22901.25826.1459330364.89548&text=%D1%84%D0%B8%D0%BB%D1%8C%D0%BC%D1%8B+%D0%BE%D0%BD%D0%BB%D0%B0%D0%B9%D0%BD&suggest_reqid=168542624144922467267026838391360&csg=3381%2C3938%2C2%2C3%2C1%2C0%2C0,'2016-03-30 12:31:44','2016-03-30 12:31:45','2016-03-30 12:31:46','2016-03-30 12:31:49',kinogo.co/,'2016-03-30 12:32:11',kinogo.co/melodramy/,0

和另一个带有网址的df

url003.ru/[a-zA-Z0-9-_%$#?.:+=|()]+/mobilnyj_telefon_bq_phoenix003.ru/[a-zA-Z0-9-_%$#?.:+=|()]+/mobilnyj_telefon_fly_003.ru/sonyxperia003.ru/[a-zA-Z0-9-_%$#?.:+=|()]+/mobilnye_telefony_smartfony003.ru/[a-zA-Z0-9-_%$#?.:+=|()]+/mobilnye_telefony_smartfony/brands5D5Bbr_231click.ru/sonyxperia1click.ru/[a-zA-Z0-9-_%$#?.:+=|()]+/chasy-motorola

我用

urls = pd.read_csv(’relevant_url1.csv’,error_bad_lines=False)substr = urls.url.values.tolist()data = pd.read_csv(’data_nts2.csv’,error_bad_lines=False,chunksize=50000)result = pd.DataFrame()for i,df in enumerate(data): res = df[df[’event_time’].str.contains(’|’.join(substr),regex=True)]

但它还给我

UserWarning: This pattern has match groups. To actually get the groups,use str.extract.

我该如何解决？

Python 编程

上一条：Python3和hmac。如何处理不是二进制的字符串下一条：如何解决错误“错误：命令错误，退出状态1：python。” 尝试使用pip安装django-heroku时

排行榜

					
					ASP.NET MVC前台动态添加文本框并在后台使用FormCollection接收值
Android Studio实现长方体表面积计算器
python实现手势识别的示例（入门）
spring acegi security 1.0.0 发布
Python xlwings插入Excel图片的实现方法
IntelliJ IDEA导出项目的方法
idea修改背景颜色样式的方法
Intellij IDEA 关闭和开启自动更新的提示?
ASP.NET泛型四之使用Lazy<T>实现延迟加载
一文带你搞懂JavaScript中转义字符的使用
Docker 部署 Prometheus的安装详细教程
				

热门标签