[python] Pomoc s pythonním RE

Bystroushaak bystrousak na kitakitsune.org
Pátek Březen 31 13:33:00 CEST 2017


Zdravím.

Potřebuji pomoc s pythonním re modulem. Hraji si s tím už několik hodin
a už jsem z toho v koncích.

Mám script:

-------------------------------------------------------------------------------
import re

data = """<tr><td class="newscap"><b style="font-size:13px">Downtime for
Christmas</b>
		<br><small>by <script language="javascript">document.write('<a
class=\"cap\"
href=\"mailto:'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'\">'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'</a>')</script><noscript>rattle</noscript>
on 12/30/12 10:48</small></td></tr>
		<tr><td class="aware" colspan="2">
		So, it appears the site was down for christmas. I could try to find
out why, but I don't care enough. Went to <a
href="https://events.ccc.de/congress/2012/wiki/Main_Page">29c3</a>,
didn't get much done, ate a lot of fast food. I'm old, fat, and boring
now. However, I found out about <a
href="http://www.hyperelliptic.org/tanja/newelliptic/newelliptic.html">Edwards
curves</a>, that shit is rad.
		</td></tr>"""

print re.sub(r'.*(<script.*>)(.*)(</script>).*',
r"\n\n---\1\n---\2\n---\3", data)
-------------------------------------------------------------------------------

Který po spuštění vypíše:

-------------------------------------------------------------------------------
<tr><td class="newscap"><b style="font-size:13px">Downtime for Christmas</b>


---<script language="javascript">document.write('<a class="cap"
href="mailto:'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'">'+rot(5,'mvoogz na vrvmzizorjmf.jmb')+'</a>
---')
---</script>
		<tr><td class="aware" colspan="2">
		So, it appears the site was down for christmas. I could try to find
out why, but I don't care enough. Went to <a
href="https://events.ccc.de/congress/2012/wiki/Main_Page">29c3</a>,
didn't get much done, ate a lot of fast food. I'm old, fat, and boring
now. However, I found out about <a
href="http://www.hyperelliptic.org/tanja/newelliptic/newelliptic.html">Edwards
curves</a>, that shit is rad.
		</td></tr>
-------------------------------------------------------------------------------

Mým cílem je mít ve skupině \1 tag <script>, tedy <script
language="javascript">, v \2 pak tělo tagu. V současné podobě se mi
oboje spojuje do \1.

"Živá" ukázka: http://ideone.com/TfbmB1

Prosím o nakopnutí správným směrem.


Další informace o konferenci Python