In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-09-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to traverse files in a specific directory in Python to extract specified information. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Demand
You need to traverse a file in a directory (text / csv, which contains url with http/https protocol) to extract the domain name contained in it and reoutput it.
Code # coding:utf-8#author: Duckweeds7import reimport osimport csvimport codecsimport urllibdef splitSign (str1): # remove the extra symbols and extract the domain name part can be modified to meet the needs str2 = str1.replace (',',') proto, rest = urllib.splittype (str2) # here the method in the urllib library is used For details, you can search res, rest = urllib.splithost (rest) return resdef text_save (filename, data): # filename is the path to the CSV file, and data is the list of data to be written. File = open (filename,'a') # a to append w to overwrite for i in range (len (data)): s = str (data [I]). Replace ('[','). Replace ('],') # remove []. These two lines can be selected as s = s.replace (",','). Replace (',',') +\ n'# remove single quotes and commas according to the data Append the newline character file.write (s) file.close () print ("Complete") def walkFile (file) at the end of each line: regex = re.compile ('[a-zA-z] +: / / [^\ s] *') all_urls = [] for root, dirs Files in os.walk (file): # root indicates the folder path currently being accessed # dirs indicates the subdirectory name list # files under this folder indicates the files under this folder list # traverses the files under the directory for f in files: f_obj = open (os.path.join (root) F)) # because files is a file name, it needs to be stitched with os.path into an absolute path get_urls = regex.findall (f_obj.read ()) # regular extraction of url all_urls.extend (map (splitSign)) Get_urls) # map function performs splitSign function processing on each item in get_urls set_urls = set (all_urls) # set collection to reprocess text_save ('E:\\ test\\ test.csv' List (set_urls)) # the output file name needs to be the absolute path if _ _ name__ = ='_ _ main__': walkFile ('E:\\ test') # enter the folder path to be processed. Thank you for reading! On "how to traverse files in a specific directory in Python to extract specified information" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r
A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
About us Contact us Product review car news thenatureplanet
More Form oMedia: AutoTimes. Bestcoffee. SL News. Jarebook. Coffee Hunters. Sundaily. Modezone. NNB. Coffee. Game News. FrontStreet. GGAMEN
© 2024 shulou.com SLNews company. All rights reserved.