The website "romip.narod.ru." is not registered with uCoz.
If you are absolutely sure your website must be here,
please contact our Support Team.
If you were searching for something on the Internet and ended up here, try again:

About uCoz web-service

Community

Legal information

 Format of documents in news collection
RIRES: Russian Information Retrieval Evaluation Seminar

 News 
 About 
 Manifesto 
 Call for participation 
 General principles 
 Participation 
 Tracks 
 Participants 
 Test collections 
 Publications 
 Relevance tables 
 History 
 2004 
 2005 
 Forum 

По-русскиПо-русски
 

Format of documents in news collection

Documents in ROMIP collections are kept in the XML form.

For each news the following is stored:
  • identifier (string)
  • header (of news article)
  • source:
    • news agency
    • URL of news article on the Web
  • time of publication
  • content (without any changes)

One XML file usually contains multiple documents to decrease number of files in the collection.

Content and title of the source document are stored in BASE64.

A sample document in the ROMIP format is below (XML file):

<?xml version="1.1"?>
<romip:dataset xmlns:romip="http://www.romip.ru/data/common" collectionId="ROMIP-2006-News">

<header>
 <version>1.1</version>
 <license type="yandex" uri="http://romip.ru/license/yandex.html"/>
 <collection-description>
      This is ROMIP news collection....
 </collection-description>
</header>

<document>
  <docID>040404-27793</docID>
  <docURL> document URL (base 64)</docURL>
  <subject encoding="base64"> title of news (base64)</subject>
  <agency>news agency name (base64)</agency>
  <timestamp>
     <date>20040402</date>
     <daytime>50493</daytime>
  </timestamp>
  <content encoding="base64"> 
      content (base64)
  </content>
</document>

<document>
  ... next document ...
</document>
...

</romip:dataset>